[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Moving element up hierarchy unless text nodes


Subject: [xsl] Moving element up hierarchy unless text nodes
From: "James Cummings james@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 27 Feb 2015 23:50:48 -0000

Hi there.

We've been looking at canonicalising use of <pb/> in a large collection of
TEI P5 XML texts. What we want to do is move this up the hierarchy unless
there is text before or after it only stopping when there is a sibling
element with textual content or when it hits the body/back/front elements.
i.e. someone might have encoded:


====input====
<body>
    <div>
        <lg>
            <l><pb n="1"/> some text here</l>
            <l>some text here <pb n="2"/></l>
        </lg>
        <lg>
            <l>some text <pb n="3"/> some text</l>
            <anchor xml:id="test"/>
            <l><pb n="4"/>some text here</l>
            <l>some text here <pb n="5"/></l>
            <anchor xml:id="test2"/>
        </lg>
    </div>
    <div>
        <head>Some Text</head>
        <lg>
            <!-- A comment here -->
            <l><pb n="6"/>Some text</l>
            <l>Some text<pb n="7"/></l>
        </lg>
    </div>
</body>
=====

And what we'd want to end up with is:

=====
<body>
    <pb n="1"/>
    <div>
        <lg>
            <l> some text here</l>
            <l>some text here </l>
        </lg>
        <pb n="2"/>
        <lg>
            <l>some text <pb n="3"/> some text</l>
            <pb n="4"/>
            <anchor xml:id="test"/>
            <l>some text here</l>
            <l>some text here </l>
            <anchor xml:id="test2"/>
        </lg>
    </div>
    <pb n="5"/>
    <div>
        <head>Some Text</head>
        <pb n="6"/>
        <lg>
            <!-- A comment here -->
            <l>Some text</l>
            <l>Some text</l>
        </lg>
    </div>
    <pb n="7"/>
</body>
=====

So as the <pb/> has text before/after it, it stays where it is. It should
move to the level in the hierarchy where its preceding-sibling::node()[1]
has text, passing over other empty elements or comments.  (Of course, as
you might expect) the markup could be any element names, I just use
div/lg/l here because it is short and nicely hierarchicial as an example.
My approach so far has been, on every element to try to test if there is
text() between where I currently am and the following::pb[1] by selecting
everything between the start and the pb and looking at its normalised
string-length. But so far these tests aren't working right, and I haven't
even got my head round how to do it in reverse for <pb/> at the end.

Has anyone done something like this before that I could look at? Any
suggestions?

Thanks for any help!

-James Cummings


Current Thread
Keywords