Page 1 of 1
Reg expression help needed
Posted: Fri Mar 25, 2011 12:11 pm
by mike004
Any regexp experts out there?
I want to find all <sections> with title= "Description":
<section id="xxx">
[whitespace, zero or more blank lines]
<title="Description">
[XML content]
</section>
And change them to <bridgehead>:
<bridgehead id="xxx">
[whitespace]
<title="Description">
[XML content]
</bridgehead>
Sounds easy, but it's not. I can find the opening <section> tag:
<section(.*)>(\n*)(\s*)<title>Description</title>
and replace it with <bridgehead>:
<bridgehead$1>$2$3<title>Description</title>
But the closing </section> tag still remains.
Re: Reg expression help needed
Posted: Fri Mar 25, 2011 1:47 pm
by adrian
Hello,
You need to perform this replace manually(a one time thing) or do you need this for an application/script that does this automatically/repeatedly?
To me it seems you primarily need to rename the
section elements to
bridgehead. In Oxygen you can easily do this with the
Rename Element(ALT+SHIFT+R) action from the contextual menu(right click on a tag and choose Refactoring -> Rename Element...). In the "Rename" dialog choose the "Rename siblings with the same name" option to rename all the sibling elements.
Or, though it would complicate things needlessly, you could use the Find/Replace and search for
section, replace with
bridgehead use the XPath:
//section, set the
Enable XML search options and make sure only
Element names is selected from the bottom section.
If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
and replace with:
Code: Select all
<bridgehead$1>$2<title>$3</title>$4</bridgehead>
Note that this won't work properly if you have nested sections.
e.g.
Code: Select all
<section>
<section>
</section>
</section>
Regards,
Adrian
Re: Reg expression help needed
Posted: Fri Mar 25, 2011 2:37 pm
by mike004
If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>
Thanks I was able to do this by modifying this experssion. The key is the (.|\s) expression, which matches "anything".
One final question:
What does (.*?) mean?
.* means match any character, except new-line, 0 or more times.
I guess the ? quantifier means just take the 1st match, or no match?
Re: Reg expression help needed
Posted: Fri Mar 25, 2011 3:15 pm
by adrian
* means greedy match 0 or more times - it will consume as much as it can
*? means lazy match 0 or more times - it will consume as little as it can
Practical example:
Given the string:
aabbaabb
Searching a.*b will match the entire string: aabbaabb
Searching a.*?b will match aabbaabb