[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Grouping text nodes


Subject: RE: [xsl] Grouping text nodes
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 3 Aug 2005 11:56:59 +0100

In XSLT 1.0 I would tackle this using the technique that I've started
referring to as "sibling recursion". The general pattern is:

(a) From the parent element do

   <xsl:apply-templates select="child::node()[1]" mode="sibling-recursion"/>

(b) Write one or more templates that match the child elements; the structure
of these is:

<xsl:template match="xxx" mode="sibling-recursion">
   ... process this node ...
   <xsl:apply-templates select="following-sibling::node()[1]"
mode="sibling-recursion">
      ... with-params ...
   </xsl:apply-templates>
</xsl:template>

In 2.0 converting "text<br/>" to "<line>text</line>" is often conveniently
done using group-ending-with="br".


This doesn't by itself help with your problem of handling the irregularities
in your input data. I think that when you have such irregularities, it's
often best to write a multiphase transformation in which each phase tries to
make the structure a bit more regular, making it easier for subsequent
phases to do their work.

But I'm afraid these are only rough ideas - I don't have time to get
immersed in the detail of what looks quite a challenging problem.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: James Cummings [mailto:cummings.james@xxxxxxxxx] 
> Sent: 03 August 2005 10:49
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Grouping text nodes
> 
> Hi there,
> 
> I have some XHTML I'm trying to transform to add more structure to it.
>  It is a copy of the Latin Vulgate Bible.  Currently the XHTML looks
> something like this:
> -----
> <div class="chapter">
> <span class="chapter-num">1</span>
>         <div class="poetrystartchapter">
>                     <span class="vn" 
> id="x1_1">1</span>&nbsp;Beatus vir qui
>                     non abiit in consilio impiorum,<br/> et 
> in via peccatorum
>                     non stetit,<br/> et in cathedra pestilenti&aelig;
> non sedit&nbsp;;<br/>
>                     <span class="vn" 
> id="x1_2">2</span>&nbsp;sed in lege
>                     Domini voluntas ejus,<br/> et in lege 
> ejus meditabitur die
>                     ac nocte.<br/>
>                     <span class="vn" 
> id="x1_3">3</span>&nbsp;Et erit tamquam
>                     lignum quod plantatum est secus decursus 
> aquarum,<br/> quod
>                     fructum suum dabit in tempore 
> suo&nbsp;:<br/> et folium
>                     ejus non defluet&nbsp;;<br/> et omnia
>                     qu&aelig;cumque faciet prosperabuntur.<br/>
> ...</div>...</div>
> -----
> What I want to get is something like:
> -----
> <div type="chapter" n="1">
>              <milestone type="poetrystartchapter"/>
>              <lg xml:id="x1_1" n="1">
>                     <l xml:id="x1_1-1">Beatus vir qui
>                     non abiit in consilio impiorum,</l>
>                    <l xml:id="x1_1-2">et in via peccatorum 
> non stetit,</l>
>                     <l xml:id="x1_1-3">et in cathedra
> pestilenti&aelig; non sedit </l>
>               </lg>
>               <lg xml:id="x1_2" n="2">
>                     <l xml:id="x1_2-1"> sed in lege Domini 
> voluntas ejus,</l>
>                     <l xml:id="x1_2-2">et in lege ejus meditabitur die
> ac nocte.</l>
>                </lg>
>                 <lg xml:id="x1_3">
>                      <l xml:id="x1_3-1"> Et erit tamquam
>                     lignum quod plantatum est secus decursus 
> aquarum,</l>
>                     <l xml:id="x1_3-2"> quod fructum suum dabit in
> tempore suo :</l>
>                     <l xml:id="x1_3-3"> et folium ejus non 
> defluet;</l>
>                     <l xml:id="x1_3-4"> et omnia qu&aelig;cumque
> faciet prosperabuntur.</l>
>                      </lg>
> <milestone type="EndOfpoetrystartchapter"/>
> ...</div>
> -----
> My problem is when I'm looking backwards to create the @xml:id for
> each of the lines whilst grouping the text nodes into lines. 
> Sometimes there is extra existing structure which seems to get in the
> way, where the <div> (if present at all) starts after the first line
> 
> -----
>  <div class="chapter"><span class="chapter-num">118</span>
>                 <span class="vn" id="x118_1">1</span>&nbsp;Alleluja. 
>                     <div class="poetry"><span
> class="speaker">Aleph.</span> Beati
>                     immaculati in via,<br/> qui ambulant in 
> lege Domini.<br/>
>                     <span class="vn" 
> id="x118_2">2</span>&nbsp;Beati qui
>                     scrutantur testimonia ejus&nbsp;;<br/> in 
> toto corde
>                     exquirunt eum.<br/>
> -----
> Which is supposed to  come out something likelike:
> -----
>  <div type="chapter" n="118">
>                 <lg xml:id="x118_1" n="1">
>                      <l xml:id="x118_1-1">Alleluja.
>                       <milestone type="poetry"/>
>                     <seg type="speaker">Aleph.</seg> Beati immaculati
> in via,</l>
>                      <l xml:id="x118_1-2"> qui ambulant in 
> lege Domini.</l>
>                  </lg>
>                   <lg>
>                      <l xml:id="x118_2-1"> Beati qui scrutantur
> testimonia ejus; </l>
>                      <l xml:id="x118_2-2"> in toto corde  
> exquirunt eum.</l>
>                   </lg>
>                    <milestone type="Endofpoetry"/>
> ... </div>
> -----
> At the moment when matching  text() to create the lines, I then look
> back (preceding:: or preceding-sibling:: ) to the span grab the
> span/@id to create the l/@xml:id... but in instances like psalm 118
> where another div or span gets in the way it tends to muck up.
> 
> So I'm convinced there is probably an entirely better way to do this. 
> Any suggestions?
> 
> Many Thanks,
> -James
> 
> -- 
> James Cummings, Cummings dot James at GMail dot com


Current Thread
Keywords