Page 1 of 1

Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 4:57 pm
by mstrubberg
I want to be able to locate any element that contains no text for example:
<p> </p>
<ul><li> </li></ul>
<note> </note>

I located another post that suggested using Find and Replace using <([^</>"=]+?)>(\s*?)</(.+?)>

I added a couple empty <p> </p> and <note> </note> tags in a topic.
In the project, I selected the topic and performed Find/Replace in Files.
In the Text to Find field, I entered <([^</>"=]+?)>(\s*?)</(.+?)>
I enabled Regular Expression.
The find did not locate the empty paragraph or note tags.

What did I do wrong, or is there another way to locate tags with no content?

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 5:37 pm
by adrian
Hi,

It should work with that expression. Did it find <li> </li>, or did it return no results?
Make sure you save the file after having modified it. Find/Replace in Files searches directly in the file saved in the file system (that's why it prompts you to save all files), it does not search in the opened from from the editor.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 6:05 pm
by mstrubberg
Adrain,

The <([^</>"=]+?)>(\s*?)</(.+?)> regular expression found none of the empty tags I added. Yes, I did save the file before performing the Find Replace in Files on the topic selected under the project in the project pane.

Are you successful in performing the same search? If yes, can you provide screen shots showing the positive result?

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 6:30 pm
by adrian
Yes, it worked for me. Check your email, I've sent you a screenshot.

So, for you did the find operation returned any results at all?
If it says "Scanned files: 0", check your file filter.

Let me know if this persists for you.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 7:04 pm
by mstrubberg
Your screenshot shows spaces between start and end tags for example:
<p> </p>
<note> </note>

When I ADDED spaces between the element start and end tags, then the find worked.

However, if the tags do NOT have any spaces between them, in author view the paragraph tag looks like this:
<p></p>
The same tag in editor view looks like this:
<p/>
So how can I search on ANY tag that in text editor appears as an "empty" tag <p/>?

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 8:48 pm
by adrian
Hi,

I copied the tags from your first post. I thought they were supposed to have spaces.
Anyway, to also cover empty tags, search for:
(<([^</>"=]+?)>(\s*?)</(.+?)>)|(<([^</>"=]+?)/>)
The last part handles the empty tags.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Fri Mar 29, 2013 9:07 pm
by mstrubberg
Adrian,

SUCCESS! The search string also worked across multiple selected files in a project....PERFECT!

Thanks for the help.... off to study regular expressions!

Re: Finding DITA Elements with no Content

Posted: Mon Apr 01, 2013 10:54 am
by adrian
Hi,

For searching XML in text mode (or Find In Files) you can also use an XPath expression to detect this situation. For XML this is actually more appropriate than tinkering with a complex regular expression that is only text aware.
e.g
Text to find: .*
XPath: //*[not(text())]
Enable Regular expression.
This XPath translates to: all elements that do not contain a text node. This is easier to describe and comprehend than a sequence of <, > and / characters from a regular expression. In addition, XPath doesn't care if the empty element is represented as an empty tag(<p/>) or consecutive start and and tags (<p></p>).
You still need the regular expression option, but it's only necessary for the expression that matches entire regions (.*).

Note that there are some differences in the results between this and the complex regular expression we've discussed before. For nested elements that contain no text (like <ul><li></li></ul>) this will match the entire parent element (<ul>...</ul>). Also, this will not match elements that contain even a space character (<li> </li>) because that's considered a text node, but I'm guessing you didn't want that anyway.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Mon Apr 01, 2013 2:59 pm
by mstrubberg
After a little research on regular expressions, I wrote this expression that also found empty elements in selected files.

<[a-z]+/>|<[a-z]+></[a-z]+>

Re: Finding DITA Elements with no Content

Posted: Mon Apr 01, 2013 3:13 pm
by adrian
Sure, that would work as long as all your element names are alphabetical. But this doesn't correctly cover all XML element names (no digits, no hyphens, etc).
e.g. Just a few examples for which this doesn't work:

Code: Select all

<section2></section2>
<email-address></email-address>
<email_address></email_address>
I would still recommend XPath for such things over any regular expression.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Mon Apr 01, 2013 3:56 pm
by mstrubberg
Using the xpath expression offered, I opened a map in resolved view and performed the find in files xpath expression. The result reported on all topicrefs, xrefs, conrefs, and legal empty elements (like othermeta), elements that don't contain text, but are not really what you're looking for as tags without content. Authors sometimes inadvertently put in elements, or remove text from elements but not the tags, or use multiple empty elements to try to create spacing, and those are the types of situations we try to locate and remove.

Re: Finding DITA Elements with no Content

Posted: Mon Apr 01, 2013 5:05 pm
by adrian
I see what you mean. That XPath expression was too broad, it didn't look at descendants or attributes.

There's always a better XPath expression that's closer to what you need:

Code: Select all

//*[not(descendant-or-self::*/text()) and not(descendant-or-self::*/attribute::*)]
This XPath translates to: all elements that themselves and any of their descendants do not have a text node or an attribute.

Regards,
Adrian

Re: Finding DITA Elements with no Content

Posted: Wed May 25, 2016 1:49 am
by lopresti
I'm trying to create a Schematron rule to do the same thing, find DITA elements that have no content. Using the rule below I can find a parent element that has no content, but nothing for child or descendant elements. Ideas?

Code: Select all

<pattern id="avoidEmptyElement" abstract="true">
<title>Check for empty elements</title>
<parameters xmlns="http://oxygenxml.com/ns/schematron/params">
<parameter>
<name>element</name>
<desc>Specifies the element to check.</desc>
</parameter>
<parameter>
<name>message</name>
<desc>The warning message that appears when the element is empty.</desc>
</parameter>
</parameters>
<rule context="$element">
<assert test="(descendant-or-self::*/text())" role="warn"> $message "$element" </assert>
</rule>
</pattern>

Re: Finding DITA Elements with no Content

Posted: Wed May 25, 2016 9:45 am
by Radu
Hi Kate,

Usually in XSLT (and thus also in XPaths used in Schematron) to check for an empty element you use this XPath expression:

Code: Select all

not(*) and not(normalize-space())
meaning that it has no children and no significant non-whitespace text.

Regards,
Radu

Re: Finding DITA Elements with no Content

Posted: Mon May 31, 2021 8:31 am
by syed
Hi,

What is the best way to find empty short descriptions?

Syed

Re: Finding DITA Elements with no Content

Posted: Mon May 31, 2021 10:30 am
by adrian
Hi,
What is the best way to find empty short descriptions?
Try this XPath:

Code: Select all

//shortdesc[not(descendant-or-self::text()) or normalize-space(string-join(descendant-or-self::text(),'')) ='']
This will find all shortdesc elements that are empty (no text node at all) and those whose text contains only spaces.

Note that this doesn't account for content references within the shortdesc.

Regards,
Adrian