Finding DITA Elements with no Content

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Finding DITA Elements with no Content

Post by mstrubberg » Fri Mar 29, 2013 4:57 pm

I want to be able to locate any element that contains no text for example:
<p> </p>
<ul><li> </li></ul>
<note> </note>

I located another post that suggested using Find and Replace using <([^</>"=]+?)>(\s*?)</(.+?)>

I added a couple empty <p> </p> and <note> </note> tags in a topic.
In the project, I selected the topic and performed Find/Replace in Files.
In the Text to Find field, I entered <([^</>"=]+?)>(\s*?)</(.+?)>
I enabled Regular Expression.
The find did not locate the empty paragraph or note tags.

What did I do wrong, or is there another way to locate tags with no content?

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Fri Mar 29, 2013 5:37 pm

Hi,

It should work with that expression. Did it find <li> </li>, or did it return no results?
Make sure you save the file after having modified it. Find/Replace in Files searches directly in the file saved in the file system (that's why it prompts you to save all files), it does not search in the opened from from the editor.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Re: Finding DITA Elements with no Content

Post by mstrubberg » Fri Mar 29, 2013 6:05 pm

Adrain,

The <([^</>"=]+?)>(\s*?)</(.+?)> regular expression found none of the empty tags I added. Yes, I did save the file before performing the Find Replace in Files on the topic selected under the project in the project pane.

Are you successful in performing the same search? If yes, can you provide screen shots showing the positive result?

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Fri Mar 29, 2013 6:30 pm

Yes, it worked for me. Check your email, I've sent you a screenshot.

So, for you did the find operation returned any results at all?
If it says "Scanned files: 0", check your file filter.

Let me know if this persists for you.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Re: Finding DITA Elements with no Content

Post by mstrubberg » Fri Mar 29, 2013 7:04 pm

Your screenshot shows spaces between start and end tags for example:
<p> </p>
<note> </note>

When I ADDED spaces between the element start and end tags, then the find worked.

However, if the tags do NOT have any spaces between them, in author view the paragraph tag looks like this:
<p></p>
The same tag in editor view looks like this:
<p/>
So how can I search on ANY tag that in text editor appears as an "empty" tag <p/>?

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Fri Mar 29, 2013 8:48 pm

Hi,

I copied the tags from your first post. I thought they were supposed to have spaces.
Anyway, to also cover empty tags, search for:
(<([^</>"=]+?)>(\s*?)</(.+?)>)|(<([^</>"=]+?)/>)
The last part handles the empty tags.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Re: Finding DITA Elements with no Content

Post by mstrubberg » Fri Mar 29, 2013 9:07 pm

Adrian,

SUCCESS! The search string also worked across multiple selected files in a project....PERFECT!

Thanks for the help.... off to study regular expressions!

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Mon Apr 01, 2013 10:54 am

Hi,

For searching XML in text mode (or Find In Files) you can also use an XPath expression to detect this situation. For XML this is actually more appropriate than tinkering with a complex regular expression that is only text aware.
e.g
Text to find: .*
XPath: //*[not(text())]
Enable Regular expression.
This XPath translates to: all elements that do not contain a text node. This is easier to describe and comprehend than a sequence of <, > and / characters from a regular expression. In addition, XPath doesn't care if the empty element is represented as an empty tag(<p/>) or consecutive start and and tags (<p></p>).
You still need the regular expression option, but it's only necessary for the expression that matches entire regions (.*).

Note that there are some differences in the results between this and the complex regular expression we've discussed before. For nested elements that contain no text (like <ul><li></li></ul>) this will match the entire parent element (<ul>...</ul>). Also, this will not match elements that contain even a space character (<li> </li>) because that's considered a text node, but I'm guessing you didn't want that anyway.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Re: Finding DITA Elements with no Content

Post by mstrubberg » Mon Apr 01, 2013 2:59 pm

After a little research on regular expressions, I wrote this expression that also found empty elements in selected files.

<[a-z]+/>|<[a-z]+></[a-z]+>

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Mon Apr 01, 2013 3:13 pm

Sure, that would work as long as all your element names are alphabetical. But this doesn't correctly cover all XML element names (no digits, no hyphens, etc).
e.g. Just a few examples for which this doesn't work:

Code: Select all

<section2></section2>
<email-address></email-address>
<email_address></email_address>
I would still recommend XPath for such things over any regular expression.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

mstrubberg
Posts: 48
Joined: Sat Jan 26, 2013 6:07 pm

Re: Finding DITA Elements with no Content

Post by mstrubberg » Mon Apr 01, 2013 3:56 pm

Using the xpath expression offered, I opened a map in resolved view and performed the find in files xpath expression. The result reported on all topicrefs, xrefs, conrefs, and legal empty elements (like othermeta), elements that don't contain text, but are not really what you're looking for as tags without content. Authors sometimes inadvertently put in elements, or remove text from elements but not the tags, or use multiple empty elements to try to create spacing, and those are the types of situations we try to locate and remove.

adrian
Posts: 2580
Joined: Tue May 17, 2005 4:01 pm

Re: Finding DITA Elements with no Content

Post by adrian » Mon Apr 01, 2013 5:05 pm

I see what you mean. That XPath expression was too broad, it didn't look at descendants or attributes.

There's always a better XPath expression that's closer to what you need:

Code: Select all

//*[not(descendant-or-self::*/text()) and not(descendant-or-self::*/attribute::*)]
This XPath translates to: all elements that themselves and any of their descendants do not have a text node or an attribute.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

lopresti
Posts: 12
Joined: Mon Mar 24, 2014 11:22 pm

Re: Finding DITA Elements with no Content

Post by lopresti » Wed May 25, 2016 1:49 am

I'm trying to create a Schematron rule to do the same thing, find DITA elements that have no content. Using the rule below I can find a parent element that has no content, but nothing for child or descendant elements. Ideas?

Code: Select all

<pattern id="avoidEmptyElement" abstract="true">
<title>Check for empty elements</title>
<parameters xmlns="http://oxygenxml.com/ns/schematron/params">
<parameter>
<name>element</name>
<desc>Specifies the element to check.</desc>
</parameter>
<parameter>
<name>message</name>
<desc>The warning message that appears when the element is empty.</desc>
</parameter>
</parameters>
<rule context="$element">
<assert test="(descendant-or-self::*/text())" role="warn"> $message "$element" </assert>
</rule>
</pattern>

Radu
Posts: 6579
Joined: Fri Jul 09, 2004 5:18 pm

Re: Finding DITA Elements with no Content

Post by Radu » Wed May 25, 2016 9:45 am

Hi Kate,

Usually in XSLT (and thus also in XPaths used in Schematron) to check for an empty element you use this XPath expression:

Code: Select all

not(*) and not(normalize-space())
meaning that it has no children and no significant non-whitespace text.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Post Reply