Searching for a <li> tag that didn't close

Oxygen general issues.
BenDupre
Posts: 2
Joined: Mon Jun 08, 2020 9:02 pm

Searching for a <li> tag that didn't close

Post by BenDupre »

I tried this regex in the search box
<li>.*?</(?=ul>)
looking for an unclosed <li> which I seem to have somewhere in my data set.
The *? quantifier is acting greedy and grabbing everything up to the closing </ul> tag. Can anyone tell me why? Or how to fix?
THANKS
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: Searching for a <li> tag that didn't close

Post by adrian »

Hi,

This: <li>.*?</(?=ul>) searches for something that starts with "<li>" and ends in "</ul>" (first occurrence) while not including in the match the "ul>" string. So it's not greedy. Greedy means choosing the longest string that can match, expanding to the right as much as possible (last "</ul>" occurrence), but what this one actually finds is the first "<li>" and ends in first "</ul>", which is not what you seem to be looking for.

Anyway, what you want is to skip correct <li>.*</li> pairs> from the match. So, check the box for the option [x] Dot matches all and try:

Code: Select all

<li>((?!</li>).)*(?=(</ul>|<li>))
Here's a breakdown of the regex:
  • <li>: matches the <li> tag
  • ((?!</li>).)*: matches any character that is not the start of a </li> tag, zero or more times. The negative lookahead (?!</li>) ensures that the </li> tag does not immediately follow the current position. This is what excludes the correct <li>.*</li> pairs.
  • (?=(</ul>|<li>)): is a zero width positive lookahead that matches either the </ul> end tag or a new <li> start tag.
BTW, have you considered using Oxygen's XML validation (or Check Well-Formedness) operation to identify XML well-form errors?

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply