xml editor

Products

Features

  EPUB
Supported platforms

Supports Windows 7 & Mac OS X Lion

Ready for XML Editor data server software
W3C Member

Reg expression help needed

Questions about XML that are not covered by the other forums should go here.

Reg expression help needed

Postby mike004 » Fri Mar 25, 2011 12:11 pm

Any regexp experts out there?

I want to find all <sections> with title= "Description":

<section id="xxx">
[whitespace, zero or more blank lines]
<title="Description">
[XML content]
</section>

And change them to <bridgehead>:

<bridgehead id="xxx">
[whitespace]
<title="Description">
[XML content]
</bridgehead>

Sounds easy, but it's not. I can find the opening <section> tag:
<section(.*)>(\n*)(\s*)<title>Description</title>

and replace it with <bridgehead>:
<bridgehead$1>$2$3<title>Description</title>

But the closing </section> tag still remains.
mike004
 
Posts: 9
Joined: Fri Feb 18, 2011 2:29 pm

Re: Reg expression help needed

Postby adrian » Fri Mar 25, 2011 1:47 pm

Hello,

You need to perform this replace manually(a one time thing) or do you need this for an application/script that does this automatically/repeatedly?

To me it seems you primarily need to rename the section elements to bridgehead. In Oxygen you can easily do this with the Rename Element(ALT+SHIFT+R) action from the contextual menu(right click on a tag and choose Refactoring -> Rename Element...). In the "Rename" dialog choose the "Rename siblings with the same name" option to rename all the sibling elements.


Or, though it would complicate things needlessly, you could use the Find/Replace and search for section, replace with bridgehead use the XPath: //section, set the Enable XML search options and make sure only Element names is selected from the bottom section.


If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:
Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>

and replace with:
Code: Select all
<bridgehead$1>$2<title>$3</title>$4</bridgehead>


Note that this won't work properly if you have nested sections.
e.g.
Code: Select all
<section>
  <section>
  </section>
</section>


Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
adrian
 
Posts: 979
Joined: Tue May 17, 2005 4:01 pm

Re: Reg expression help needed

Postby mike004 » Fri Mar 25, 2011 2:37 pm

If for some reason(e.g. you need to do this programatically) you still want to do this with regular expressions, search for:

Code: Select all
<section(.*?)>(\s*)<title>(.*?)</title>((.|\s)*?)</section>




Thanks I was able to do this by modifying this experssion. The key is the (.|\s) expression, which matches "anything".

One final question:

What does (.*?) mean?
.* means match any character, except new-line, 0 or more times.
I guess the ? quantifier means just take the 1st match, or no match?
mike004
 
Posts: 9
Joined: Fri Feb 18, 2011 2:29 pm

Re: Reg expression help needed

Postby adrian » Fri Mar 25, 2011 3:15 pm

* means greedy match 0 or more times - it will consume as much as it can
*? means lazy match 0 or more times - it will consume as little as it can

Practical example:
Given the string:
aabbaabb

Searching a.*b will match the entire string: aabbaabb
Searching a.*?b will match aabbaabb
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
adrian
 
Posts: 979
Joined: Tue May 17, 2005 4:01 pm


Return to General XML Questions

Who is online

Users browsing this forum: No registered users and 0 guests

XML Editor | XML Author | WYSIWYG Editors | Schema Editor | XSD Documentation | XSL/XSLT Editor | XQuery | XML Databases | SVN Client
© 2002-2011 SyncRO Soft Ltd. All rights reserved. | Sitemap | Privacy Policy | This website was created & generated with <oXygen/>® XML Editor