[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Moving element up hierarchy unless text nodes


Subject: Re: [xsl] Moving element up hierarchy unless text nodes
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 6 Apr 2015 19:07:05 -0000

Dear James,

I am relieved it seems to have passed all the tests so far!

One thing that might shed light on the operation of this is the single
edge case for which I think its behavior would be ... interesting,
namely:

<div><lg><l><pb/></l></lg></div>

I hope and trust this never happens in your data.

Cheers, Wendell


On Mon, Apr 6, 2015 at 9:22 AM, James Cummings james@xxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I _finally_ had a chance to test and make sure I think I understand the
> clever solution Wendell came up with for moving <pb/> elements before or
> after nodes with no text content and/or whitespace-only nodes. I must
> apologise to him for delaying so long in doing so. Mea culpa.
>
> I've added some comments to the XSL to ensure I understood what was going
> on. Although I've never really been good with key()s the bits that confused
> me most were:
> ===
>     <!-- copy pb if it is both leading and trailing, thus stays put -->
>     <xsl:template match="pb">
>         <xsl:if test="(. is key('leading-pb',generate-id())) and
>             (. is key('trailing-pb',generate-id()))">
>             <xsl:copy-of select="."/>
>         </xsl:if>
>     </xsl:template>
> ===
> Where if I understand it, a <pb/> is only copied if its generate-id is equal
> to be leading-pb and trailng-pb key. (i.e. it is in the middle some elements
> with text, or a text node, or similar, so it stays where it is.)
>
> The other confusing bit for me was the test in the leading/trailing-pb mode
> matching any element but closer inspection I think means I understand it.
> (Though never would have thought of it...) This tests for trailing-pb mode
> that the result is empty for the follow-sibling nodes or text that isn't
> just whitespace.  Otherwise it generates an id.
> ===
>    <xsl:choose>
>             <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>                 following-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="trailing-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
> ===
>
> I think I understand all the individual bits to this but still have
> difficulty thinking through the whole thing.
>
> It does seem to work on all the tests I've tried.
>
> Thanks Wendell!
>
> -James
>
> =====full xslt===
>   <!-- comments, processing instructions, text nodes and attributes -->
>     <xsl:template match="comment() | processing-instruction() | text() |
> @*">
>         <xsl:copy-of select="."/>
>     </xsl:template>
>
>     <!-- copy elements separately so can move pb elements -->
>     <xsl:template match="*">
>         <!-- copy the pb only if no ancestor considers it leading or
> trailing -->
>         <xsl:copy-of select="key('leading-pb',generate-id())"/>
>         <!-- copy the element, attributes, and process nodes -->
>         <xsl:copy>
>             <xsl:apply-templates select="@* | node()"/>
>         </xsl:copy>
>         <xsl:copy-of select="key('trailing-pb',generate-id())"/>
>     </xsl:template>
>
>     <!-- copy pb if it is both leading and trailing, thus stays put -->
>     <xsl:template match="pb">
>         <xsl:if test="(. is key('leading-pb',generate-id())) and
>             (. is key('trailing-pb',generate-id()))">
>             <xsl:copy-of select="."/>
>         </xsl:if>
>     </xsl:template>
>
>     <!-- key for leading pb applying templates in leading-pb mode -->
>     <xsl:key name="leading-pb" match="pb">
>         <xsl:apply-templates select="." mode="leading-pb"/>
>     </xsl:key>
>     <!-- key for trailing pb applying templates in trailing-pb mode -->
>     <xsl:key name="trailing-pb" match="pb">
>         <xsl:apply-templates select="." mode="trailing-pb"/>
>     </xsl:key>
>
>     <!-- everything directly under body generate an id -->
>     <xsl:template match="body/*" mode="leading-pb trailing-pb">
>         <xsl:sequence select="generate-id()"/>
>     </xsl:template>
>
>     <!-- when the preceding-sibling is empty or not whitespace
> apply-templates in leading-pb to the parent -->
>     <xsl:template match="*" mode="leading-pb">
>         <xsl:choose>
>             <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
>                 preceding-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="leading-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
>
>     <!-- when the preceding-sibling is empty or not whitespace
> apply-templates in leading-pb to the parent -->
>     <xsl:template match="*" mode="trailing-pb">
>         <xsl:choose>
>             <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>                 following-sibling::text()[matches(.,'\S')])">
>                 <xsl:apply-templates select=".." mode="trailing-pb"/>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:sequence select="generate-id()"/>
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
>  =====
>
>
>
>
>
>
> On Wed, Mar 4, 2015 at 12:36 AM, James Cummings james@xxxxxxxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> Cool Wendell!
>>
>> I've not had a chance to test this out yet, I may have to come back to you
>> with some questions as I'm really not sure I understand that match pattern.
>> I'll have a play with it.
>>
>> Many thanks!
>>
>> -James
>>
>> On Tue, Mar 3, 2015 at 7:48 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi again James,
>>>
>>> So in the code I posted yesterday I realized at least one more
>>> interesting improvement is possible.
>>>
>>> Instead of
>>>
>>> <xsl:template match="pb">
>>>   <!-- Only copy the pb if no ancestor considers it 'leading' or
>>> 'trailing'. -->
>>>   <xsl:if test="empty(ancestor::*/
>>>         (key('leading-pb',generate-id()) |
>>>          key('trailing-pb',generate-id())) intersect . )  ">
>>>     <xsl:copy-of select="."/>
>>>   </xsl:if>
>>> </xsl:template>
>>>
>>> We could have more directly and efficiently
>>>
>>>   <xsl:template match="pb">
>>>     <xsl:if test="(. is key('leading-pb',generate-id())) and
>>>             (. is key('trailing-pb',generate-id()))">
>>>       <xsl:copy-of select="."/>
>>>     </xsl:if>
>>>   </xsl:template>
>>>
>>>
>>> Or even (if you are crazy for match patterns, and who isn't)
>>>
>>> <xsl:template match="pb[empty(key('leading-pb',generate-id())) or
>>>       empty(key('trailing-pb',generate-id()))]"/>
>>>
>>> These work because the keys bind pb elements to themselves when they
>>> are not 'leading' or 'trailing' (i.e. correctly outside not inside
>>> their parent).
>>>
>>> Cheers, Wendell
>>>
>>> On Mon, Mar 2, 2015 at 2:11 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
>>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> > Hi James,
>>> >
>>> > So, try this. It works by assigning 'pb' elements to ancestors that
>>> > consider them 'leading' (start the element off) or 'trailing'. They
>>> > can be retrieved from (for) said ancestor using a key.
>>> >
>>> > Lightly tested.
>>> >
>>> > <xsl:template match="comment() | processing-instruction() | text() |
>>> > @*">
>>> >   <xsl:copy-of select="."/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*">
>>> >   <xsl:copy-of select="key('leading-pb',generate-id())"/>
>>> >   <xsl:copy>
>>> >     <xsl:apply-templates select="@* | node()"/>
>>> >   </xsl:copy>
>>> >   <xsl:copy-of select="key('trailing-pb',generate-id())"/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="pb">
>>> >   <!-- Only copy the pb if no ancestor considers it 'leading' or
>>> > 'trailing'. -->
>>> >   <xsl:if test="empty(
>>> >     ancestor::*/(key('leading-pb',generate-id()) |
>>> > key('trailing-pb',generate-id())) intersect . )  ">
>>> >     <xsl:copy-of select="."/>
>>> >   </xsl:if>
>>> > </xsl:template>
>>> >
>>> > <xsl:key name="leading-pb" match="pb">
>>> >   <xsl:apply-templates select="." mode="leading-pb"/>
>>> > </xsl:key>
>>> >
>>> > <xsl:key name="trailing-pb" match="pb">
>>> >   <xsl:apply-templates select="." mode="trailing-pb"/>
>>> > </xsl:key>
>>> >
>>> > <xsl:template match="body/*" mode="leading-pb trailing-pb">
>>> >   <xsl:sequence select="generate-id()"/>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*" mode="leading-pb">
>>> >   <xsl:choose>
>>> >     <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
>>> > preceding-sibling::text()[matches(.,'\S')])">
>>> >       <xsl:apply-templates select=".." mode="leading-pb"/>
>>> >     </xsl:when>
>>> >     <xsl:otherwise>
>>> >       <xsl:sequence select="generate-id()"/>
>>> >     </xsl:otherwise>
>>> >   </xsl:choose>
>>> > </xsl:template>
>>> >
>>> > <xsl:template match="*" mode="trailing-pb">
>>> >   <xsl:choose>
>>> >     <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>>> > following-sibling::text()[matches(.,'\S')])">
>>> >       <xsl:apply-templates select=".." mode="trailing-pb"/>
>>> >     </xsl:when>
>>> >     <xsl:otherwise>
>>> >       <xsl:sequence select="generate-id()"/>
>>> >     </xsl:otherwise>
>>> >   </xsl:choose>
>>> > </xsl:template>
>>> >
>>> > Feel free to ask for any explanation needed. It *seems* to work
>>> > (although I often do not trust my lying eyes) ... :-)
>>> >
>>> > Cheers, Wendell
>>> >
>>> > On Fri, Feb 27, 2015 at 6:51 PM, James Cummings
>>> > james@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> Hi there.
>>> >>
>>> >> We've been looking at canonicalising use of <pb/> in a large
>>> >> collection of
>>> >> TEI P5 XML texts. What we want to do is move this up the hierarchy
>>> >> unless
>>> >> there is text before or after it only stopping when there is a sibling
>>> >> element with textual content or when it hits the body/back/front
>>> >> elements.
>>> >> i.e. someone might have encoded:
>>> >>
>>> >>
>>> >> ====input====
>>> >> <body>
>>> >>     <div>
>>> >>         <lg>
>>> >>             <l><pb n="1"/> some text here</l>
>>> >>             <l>some text here <pb n="2"/></l>
>>> >>         </lg>
>>> >>         <lg>
>>> >>             <l>some text <pb n="3"/> some text</l>
>>> >>             <anchor xml:id="test"/>
>>> >>             <l><pb n="4"/>some text here</l>
>>> >>             <l>some text here <pb n="5"/></l>
>>> >>             <anchor xml:id="test2"/>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <div>
>>> >>         <head>Some Text</head>
>>> >>         <lg>
>>> >>             <!-- A comment here -->
>>> >>             <l><pb n="6"/>Some text</l>
>>> >>             <l>Some text<pb n="7"/></l>
>>> >>         </lg>
>>> >>     </div>
>>> >> </body>
>>> >> =====
>>> >>
>>> >> And what we'd want to end up with is:
>>> >>
>>> >> =====
>>> >> <body>
>>> >>     <pb n="1"/>
>>> >>     <div>
>>> >>         <lg>
>>> >>             <l> some text here</l>
>>> >>             <l>some text here </l>
>>> >>         </lg>
>>> >>         <pb n="2"/>
>>> >>         <lg>
>>> >>             <l>some text <pb n="3"/> some text</l>
>>> >>             <pb n="4"/>
>>> >>             <anchor xml:id="test"/>
>>> >>             <l>some text here</l>
>>> >>             <l>some text here </l>
>>> >>             <anchor xml:id="test2"/>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <pb n="5"/>
>>> >>     <div>
>>> >>         <head>Some Text</head>
>>> >>         <pb n="6"/>
>>> >>         <lg>
>>> >>             <!-- A comment here -->
>>> >>             <l>Some text</l>
>>> >>             <l>Some text</l>
>>> >>         </lg>
>>> >>     </div>
>>> >>     <pb n="7"/>
>>> >> </body>
>>> >> =====
>>> >>
>>> >> So as the <pb/> has text before/after it, it stays where it is. It
>>> >> should
>>> >> move to the level in the hierarchy where its
>>> >> preceding-sibling::node()[1]
>>> >> has text, passing over other empty elements or comments.  (Of course,
>>> >> as you
>>> >> might expect) the markup could be any element names, I just use
>>> >> div/lg/l
>>> >> here because it is short and nicely hierarchicial as an example. My
>>> >> approach
>>> >> so far has been, on every element to try to test if there is text()
>>> >> between
>>> >> where I currently am and the following::pb[1] by selecting everything
>>> >> between the start and the pb and looking at its normalised
>>> >> string-length.
>>> >> But so far these tests aren't working right, and I haven't even got my
>>> >> head
>>> >> round how to do it in reverse for <pb/> at the end.
>>> >>
>>> >> Has anyone done something like this before that I could look at? Any
>>> >> suggestions?
>>> >>
>>> >> Thanks for any help!
>>> >>
>>> >> -James Cummings
>>> >> XSL-List info and archive
>>> >> EasyUnsubscribe (by email)
>>> >
>>> >
>>> >
>>> > --
>>> > Wendell Piez | http://www.wendellpiez.com
>>> > XML | XSLT | electronic publishing
>>> > Eat Your Vegetables
>>> > _____oo_________o_o___ooooo____ooooooo_^
>>> >
>>>
>>>
>>>
>>> --
>>> Wendell Piez | http://www.wendellpiez.com
>>> XML | XSLT | electronic publishing
>>> Eat Your Vegetables
>>> _____oo_________o_o___ooooo____ooooooo_^
>>>
>>
>> XSL-List info and archive
>> EasyUnsubscribe (by email)
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)



-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


Current Thread
Keywords