[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] XSLT2 node comparison, wordlists


Subject: RE: [xsl] XSLT2 node comparison, wordlists
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 24 Oct 2007 18:34:08 +0100

Nice to see you back, James.

It's not easy, actually! It's in effect value-based grouping where the
equality function is deep-equals() rather than the eq operator. I think I
would do this by something along the lines of

<xsl:for-each-group group-by="saxon:serialize(.)">

except that doesn't quite work because you want to ignore the xml:id.

If the markup is only one level deep you could to the same thing by hand,
along the lines

<xsl:for-each-group group-by="my:serialize(.)">


<xsl:function name="my:serialize">
  <xsl:param name="in" as="element(orth)"/>
  <xsl:apply-templates select="$in/child::node()" mode="grouping-key"/>
</xsl:function>

<xsl:template match="text()" mode="grouping-key">
  <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="*" mode="grouping-key">
  <xsl:text>&lt;</xsl:text>
    <xsl:value-of select="name()"/>
    <xsl:for-each select="@*">
      <xsl:text> </xsl:text>
...etc

Of course the grouping key doesn't actually need to be an XML serialization,
it can have any syntax you fancy so long as it distinguishes distinct
values.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: James Cummings [mailto:cummings.james@xxxxxxxxx] 
> Sent: 24 October 2007 17:56
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] XSLT2 node comparison, wordlists
> 
> I'm sure this is easy to do in XSLT2 but I've just not got my 
> head wrapped around how to compare things properly in an 
> efficient manner.
> 
> Let's say I have a wordlist where automatically generated 
> from another file I've got instances of how each word was 
> used.  In many cases these are identical in spelling, and 
> what I want to do is merge them and store links between the 
> original file and the wordlist in a stand-off markup method.
> 
> Say the file has entries for each word which are like:
> 
> =====
> <entry xml:id="let22-w27">
>   <form>
>     <orth type="hw">the</orth>
>     <form type="orthVar">
>       <orth xml:id="w72">The</orth>
>       <orth xml:id="w3955">The</orth>
>       <orth xml:id="w4513">The</orth>
>       <orth xml:id="w4578">The</orth>
>       <orth xml:id="w4650">The</orth>
>       <orth xml:id="w4672">The</orth>
>       <orth xml:id="w4703">The</orth>
>       <orth xml:id="w4824">The</orth>
>       <orth xml:id="w4830">The</orth>
>       <orth xml:id="w2045">the</orth>
>       <orth xml:id="w2079">the</orth>
>       <orth xml:id="w2101">the</orth>
>       <orth xml:id="w2112">the</orth>
>       <orth xml:id="w2333">the</orth>
>       <orth xml:id="w2400">the</orth>
>       <orth xml:id="w2442">the</orth>
>       <orth xml:id="w1402">T<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w2422">T<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w6458">T<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w7822">T<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w2097">t<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w2155">t<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w2482">t<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w5887">t<ex>h</ex><hi rend="sup">e</hi></orth>
>       <orth xml:id="w5642">T<ex>h</ex>e</orth>
>       <orth xml:id="w5378">t<ex>h</ex>e</orth>
>       </form>
>   </form>
> </entry>
> =====
> What I want to end up with is for each form[@type='orthVar'] 
> only distinct-values for the orth elements therein with new 
> @xml:id values, and the old ones preserved at the bottom of 
> the file linking new values with the current ones (which are 
> copies from a different file).
>  So something like:
> 
> =====
> <div>
>   <entry xml:id="let22-w27">
>     <form>
>       <orth type="hw">the</orth>
>       <form type="orthVar" n="6"> <!-- n= num of diff variants-->
>         <orth xml:id="let22-w27-vA">The</orth>
>         <orth xml:id="let22-w27-vB">the</orth>
>         <orth xml:id="let22-w27-vC">T<ex>h</ex><hi 
> rend="sup">e</hi></orth>
>         <orth xml:id="let22-w27-vD">t<ex>h</ex><hi 
> rend="sup">e</hi></orth>
>         <orth xml:id="let22-w27-vE">T<ex>h</ex>e</orth>
>         <orth xml:id="let22-w27-vF">t<ex>h</ex>e</orth>
>       </form>
>     </form>
>   </entry>
> 
>   <!-- more entries -->
> 
>   <!-- at bottom of file -->
>   <div type="links">
>   <linkGrp xml:id="let22-w27-lg">
>   <!-- links between the orth form above with its instance in 
> file.xml -->
>     <link targets="#let22-w27-vA  file.xml#w72 file.xml#w3955
>       file.xml#w4513 file.xml#w4578 file.xml#w4650 file.xml#w4672
>       file.xml#w4703  file.xml#w4824 file.xml#w4830"/>
>     <link targets="#let22-w27-vB file.xml#w2045  file.xml#w2079
>       file.xml#w2101 file.xml#w2112 file.xml#w2333 file.xml#w2400
>       file.xml#w2442"/>
>     <link targets="#let22-w27-vC file.xml#w1402 file.xml#w2422
>       file.xml#w6458 file.xml#w7822 "/>
>     <link targets="#let22-w27-vD file.xml#w2097 file.xml#w2155
>       file.xml#w2482 file.xml#w5887"/>
>     <link targets="#let22-w27-vE file.xml#w5642"/>
>     <link targets="#let22-w27-vF  file.xml#w5378"/>
>   </linkGrp>
>     <!-- more linkGrps -->
>     </div>
> </div>
> ======
> XSLT2 is certainly usable in this case, but all of my 
> attempts have been hideously inefficient, or fail to 
> accurately compare the nested children properly.
> 
> Suggestions?
> 
> Thanks,
> -James
> 
> --
> James Cummings, Cummings dot James at GMail dot com


Current Thread
Keywords