Cant figure out this Reg. Expression

Questions about XML that are not covered by the other forums should go here.
apeled
Posts: 1
Joined: Mon Jun 13, 2005 7:07 pm

Cant figure out this Reg. Expression

Post by apeled »

I dont know if this is possible or not but it would make my life 1000 times easier if it was.

I have XML that looks like this:

<book>
<title>some title</title>
<chapter/>
</book>

Basically I need to erase entire book chunks whose chapter nodes are empty. I'm not a programmer otherwise im sure I could write java to do this in two seconds.

I was playing around with the regexp find/replace in Oxygen and the best I can do is match the entire <title>some title</title> line with the following expression: <title>.+<title> But what I need is to match that entire chunk above. I tried things like <book>\S+<title>.+</title>\S+<chapter/>\S+</book> wont work. even <book>[\t\n\r\f\v]*<title> etc..

Does anyone know how to accomplish this?

Thanks for your help!
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Post by Radu »

Hi,

A regexp to select the undesired <book> chunks would be something like this:

Code: Select all

<book>(?s)(.*?)<chapter/>(?s)(.*?)</book>
Unfortunately the standalone versions of Oxygen (including the current version 7) have a limitation for regular expressions find: they find matches on a line-per-line basis, so even if the expression is correct it won't match something that spreads across multiple lines.
There is already a bug added to improve this behaviour.

What you can do for now are two things:

1) You can use the Eclipse plugin version of Oxygen which does not have this limitation
or
2) You can use a stylesheet to filter the unwanted book tags out of the xml:
If for example your XML file is something like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<root>
<book>
<title>some title</title>
<chapter>some chapter</chapter>
</book>
<book>
<title>some title</title>
<chapter/>
</book>
</root>
Then a stylesheet that would filter empty chapters is something like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="root">
<root>
<xsl:for-each select="book">
<!--If the chapter has text value, copy all the book's contents-->
<xsl:if test="chapter/text()!=''">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Hope this helps,
Regards, Radu.
Post Reply