Reg expression help needed

Questions about XML that are not covered by the other forums should go here.
mike004
Posts: 18
Joined: Fri Feb 18, 2011 2:29 pm

Reg expression help needed

Post by mike004 »

Any regexp experts out there?

I want to find all <sections> with title= "Description":

<section id="xxx">
[whitespace, zero or more blank lines]
<title="Description">
[XML content]
</section>

And change them to <bridgehead>:

<bridgehead id="xxx">
[whitespace]
<title="Description">
[XML content]
</bridgehead>

Sounds easy, but it's not. I can find the opening <section> tag:
<section(.*)>(\n*)(\s*)<title>Description</title>

and replace it with <bridgehead>:
<bridgehead$1>$2$3<title>Description</title>

But the closing </section> tag still remains.
adrian
Posts: 2850
Joined: Tue May 17, 2005 4:01 pm

Re: Reg expression help needed

Post by adrian »

Hello,

You need to perform this replace manually(a one time thing) or do you need this for an application/script that does this automatically/repeatedly?

To me it seems you primarily need to rename the section elements to bridgehead. In Oxygen you can easily do this with the Rename Element(ALT+SHIFT+R) action from the contextual menu(right click on a tag and choose Refactoring -> Rename Element...). In the "Rename" dialog choose the "Rename siblings with the same name" option to rename all the sibling elements.


Or, though it would complicate things needlessly, you could use the Find/Replace and search for section, replace with bridgehead use the XPath: //section, set the Enable XML search options and make sure only Element names is selected from the bottom section.


If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:

Code: Select all

<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
and replace with:

Code: Select all

<bridgehead$1>$2<title>$3</title>$4</bridgehead>
Note that this won't work properly if you have nested sections.
e.g.

Code: Select all

<section>
<section>
</section>
</section>
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
mike004
Posts: 18
Joined: Fri Feb 18, 2011 2:29 pm

Re: Reg expression help needed

Post by mike004 »

If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:

Code: Select all

 <section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
Thanks I was able to do this by modifying this experssion. The key is the (.|\s) expression, which matches "anything".

One final question:

What does (.*?) mean?
.* means match any character, except new-line, 0 or more times.
I guess the ? quantifier means just take the 1st match, or no match?
adrian
Posts: 2850
Joined: Tue May 17, 2005 4:01 pm

Re: Reg expression help needed

Post by adrian »

* means greedy match 0 or more times - it will consume as much as it can
*? means lazy match 0 or more times - it will consume as little as it can

Practical example:
Given the string:
aabbaabb

Searching a.*b will match the entire string: aabbaabb
Searching a.*?b will match aabbaabb
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply