Reg expression help needed
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 18
- Joined: Fri Feb 18, 2011 2:29 pm
Reg expression help needed
Any regexp experts out there?
I want to find all <sections> with title= "Description":
<section id="xxx">
[whitespace, zero or more blank lines]
<title="Description">
[XML content]
</section>
And change them to <bridgehead>:
<bridgehead id="xxx">
[whitespace]
<title="Description">
[XML content]
</bridgehead>
Sounds easy, but it's not. I can find the opening <section> tag:
<section(.*)>(\n*)(\s*)<title>Description</title>
and replace it with <bridgehead>:
<bridgehead$1>$2$3<title>Description</title>
But the closing </section> tag still remains.
I want to find all <sections> with title= "Description":
<section id="xxx">
[whitespace, zero or more blank lines]
<title="Description">
[XML content]
</section>
And change them to <bridgehead>:
<bridgehead id="xxx">
[whitespace]
<title="Description">
[XML content]
</bridgehead>
Sounds easy, but it's not. I can find the opening <section> tag:
<section(.*)>(\n*)(\s*)<title>Description</title>
and replace it with <bridgehead>:
<bridgehead$1>$2$3<title>Description</title>
But the closing </section> tag still remains.
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Reg expression help needed
Hello,
You need to perform this replace manually(a one time thing) or do you need this for an application/script that does this automatically/repeatedly?
To me it seems you primarily need to rename the section elements to bridgehead. In Oxygen you can easily do this with the Rename Element(ALT+SHIFT+R) action from the contextual menu(right click on a tag and choose Refactoring -> Rename Element...). In the "Rename" dialog choose the "Rename siblings with the same name" option to rename all the sibling elements.
Or, though it would complicate things needlessly, you could use the Find/Replace and search for section, replace with bridgehead use the XPath: //section, set the Enable XML search options and make sure only Element names is selected from the bottom section.
If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
and replace with:
Note that this won't work properly if you have nested sections.
e.g.
Regards,
Adrian
You need to perform this replace manually(a one time thing) or do you need this for an application/script that does this automatically/repeatedly?
To me it seems you primarily need to rename the section elements to bridgehead. In Oxygen you can easily do this with the Rename Element(ALT+SHIFT+R) action from the contextual menu(right click on a tag and choose Refactoring -> Rename Element...). In the "Rename" dialog choose the "Rename siblings with the same name" option to rename all the sibling elements.
Or, though it would complicate things needlessly, you could use the Find/Replace and search for section, replace with bridgehead use the XPath: //section, set the Enable XML search options and make sure only Element names is selected from the bottom section.
If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
Code: Select all
<bridgehead$1>$2<title>$3</title>$4</bridgehead>
e.g.
Code: Select all
<section>
<section>
</section>
</section>
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 18
- Joined: Fri Feb 18, 2011 2:29 pm
Re: Reg expression help needed
Thanks I was able to do this by modifying this experssion. The key is the (.|\s) expression, which matches "anything".If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
One final question:
What does (.*?) mean?
.* means match any character, except new-line, 0 or more times.
I guess the ? quantifier means just take the 1st match, or no match?
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Reg expression help needed
* means greedy match 0 or more times - it will consume as much as it can
*? means lazy match 0 or more times - it will consume as little as it can
Practical example:
Given the string:
aabbaabb
Searching a.*b will match the entire string: aabbaabb
Searching a.*?b will match aabbaabb
*? means lazy match 0 or more times - it will consume as little as it can
Practical example:
Given the string:
aabbaabb
Searching a.*b will match the entire string: aabbaabb
Searching a.*?b will match aabbaabb
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service