Searching for a <li> tag that didn't close
Oxygen general issues.
-
- Posts: 4
- Joined: Mon Jun 08, 2020 9:02 pm
Searching for a <li> tag that didn't close
I tried this regex in the search box
<li>.*?</(?=ul>)
looking for an unclosed <li> which I seem to have somewhere in my data set.
The *? quantifier is acting greedy and grabbing everything up to the closing </ul> tag. Can anyone tell me why? Or how to fix?
THANKS
<li>.*?</(?=ul>)
looking for an unclosed <li> which I seem to have somewhere in my data set.
The *? quantifier is acting greedy and grabbing everything up to the closing </ul> tag. Can anyone tell me why? Or how to fix?
THANKS
Ben Dupre
"The greatest problem with communication is the illusion that it has been achieved." -- GB Shaw
"The greatest problem with communication is the illusion that it has been achieved." -- GB Shaw
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Searching for a <li> tag that didn't close
Hi,
This: <li>.*?</(?=ul>) searches for something that starts with "<li>" and ends in "</ul>" (first occurrence) while not including in the match the "ul>" string. So it's not greedy. Greedy means choosing the longest string that can match, expanding to the right as much as possible (last "</ul>" occurrence), but what this one actually finds is the first "<li>" and ends in first "</ul>", which is not what you seem to be looking for.
Anyway, what you want is to skip correct <li>.*</li> pairs> from the match. So, check the box for the option [x] Dot matches all and try:
Here's a breakdown of the regex:
Regards,
Adrian
This: <li>.*?</(?=ul>) searches for something that starts with "<li>" and ends in "</ul>" (first occurrence) while not including in the match the "ul>" string. So it's not greedy. Greedy means choosing the longest string that can match, expanding to the right as much as possible (last "</ul>" occurrence), but what this one actually finds is the first "<li>" and ends in first "</ul>", which is not what you seem to be looking for.
Anyway, what you want is to skip correct <li>.*</li> pairs> from the match. So, check the box for the option [x] Dot matches all and try:
Code: Select all
<li>((?!</li>).)*(?=(</ul>|<li>))
- <li>: matches the <li> tag
- ((?!</li>).)*: matches any character that is not the start of a </li> tag, zero or more times. The negative lookahead (?!</li>) ensures that the </li> tag does not immediately follow the current position. This is what excludes the correct <li>.*</li> pairs.
- (?=(</ul>|<li>)): is a zero width positive lookahead that matches either the </ul> end tag or a new <li> start tag.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service