Cant figure out this Reg. Expression

Post by **apeled** » Thu Jan 26, 2006 12:17 am

I dont know if this is possible or not but it would make my life 1000 times easier if it was.

I have XML that looks like this:

<book>
<title>some title</title>
<chapter/>
</book>

Basically I need to erase entire book chunks whose chapter nodes are empty. I'm not a programmer otherwise im sure I could write java to do this in two seconds.

I was playing around with the regexp find/replace in Oxygen and the best I can do is match the entire <title>some title</title> line with the following expression: <title>.+<title> But what I need is to match that entire chunk above. I tried things like <book>\S+<title>.+</title>\S+<chapter/>\S+</book> wont work. even <book>[\t\n\r\f\v]*<title> etc..

Does anyone know how to accomplish this?

Thanks for your help!

Post by **Radu** » Thu Jan 26, 2006 10:02 am

Hi,

A regexp to select the undesired <book> chunks would be something like this:

Code: Select all

<book>(?s)(.*?)<chapter/>(?s)(.*?)</book>

Unfortunately the standalone versions of Oxygen (including the current version 7) have a limitation for regular expressions find: they find matches on a line-per-line basis, so even if the expression is correct it won't match something that spreads across multiple lines.
There is already a bug added to improve this behaviour.

What you can do for now are two things:

1) You can use the Eclipse plugin version of Oxygen which does not have this limitation
or
2) You can use a stylesheet to filter the unwanted book tags out of the xml:
If for example your XML file is something like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>

<root>

    <book>

        <title>some title</title>

        <chapter>some chapter</chapter>

    </book>

    <book>

        <title>some title</title>

        <chapter/>

    </book>

</root>

Then a stylesheet that would filter empty chapters is something like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="root">

        <root>

            <xsl:for-each select="book">

                <!--If the chapter has text value, copy all the book's contents-->

                <xsl:if test="chapter/text()!=''">

                    <xsl:copy-of select="."/>

                </xsl:if>

            </xsl:for-each>            

        </root>

    </xsl:template>

</xsl:stylesheet>

Hope this helps,
Regards, Radu.