[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing

Subject: Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Mon, 20 Aug 2012 01:03:12 +0200

You could try the following:

In a first pass, calculate a hash value (string) for each map.

Canonicalize the hash calculation by sorting the singletonMaps for the from and to values. A hash for the given map may be:

Store the hash-enhanced maps in a variable of type document-node(element(maps)) so that you will be able to use this document node as a third argument to the key() function.

Define a key that returns a map for its hash value.

Apply distinct-values to the sequence of hashes and, for each distinct hash, return the first item that the key lookup returns for this hash value.

Of course you could also group-by the hash values and return each group's context item (= first item). You don't need to passes then.


On 2012-08-20 00:43, Costello, Roger L. wrote:
Hi Folks,

I have a sequence of 46,656 elements that I call "maps."

Here is one map:

     <map id="Planes-Enroute-to-Airports">
         <singletonMap from="F16" to="DFW"/>
         <singletonMap from="B707" to="ORD"/>
         <singletonMap from="F35" to="MIA"/>
         <singletonMap from="S340" to="LAX"/>
         <singletonMap from="A320" to="SFO"/>
         <singletonMap from="MD90" to="DEN"/>

I wrote a function to return all of the distinct maps.

Unfortunately it takes about 5 hours of XSLT processing.

Perhaps my XSLT program is inefficient.

I am hoping that you can show me a more efficient program or identify where my program is inefficient.

I am using XSLT 2.0.

Here is my function to return all distinct maps:

     <xsl:function name="ct:distinct" as="element(map)*">
         <xsl:param name="maps" />

         <xsl:variable name="new-maps">
                 <xsl:sequence select="$maps" />
         <xsl:for-each select="$new-maps/maps/map">
                 <xsl:if test="not(ct:contained-within(., ./following-sibling::map))"><xsl:sequence select="." /></xsl:if>


The following function determines if a map is contained within a sequence of maps; it uses a binary divide-and-conquer approach:

     <xsl:function name="ct:contained-within" as="xs:boolean">
         <xsl:param name="map" as="element(map)"/>
         <xsl:param name="maps" as="element(map)*"/>

<xsl:variable name="cnt" select="count($maps)" />

             <xsl:when test="$cnt eq 0"><xsl:value-of select="false()" /></xsl:when>
             <xsl:when test="ct:equal($map, $maps[1])"><xsl:value-of select="true()" /></xsl:when>
             <xsl:when test="$cnt eq 1"><xsl:value-of select="false()" /></xsl:when>
                 <xsl:variable name="half" select="$cnt idiv 2" />
                     <xsl:when test="ct:contained-within($map, $maps[position() = (2 to $half)])"><xsl:value-of select="true()" /></xsl:when>
                     <xsl:otherwise><xsl:value-of select="ct:contained-within($map, $maps[position() = (($half + 1) to last())])" /></xsl:otherwise>



Two maps are equal iff for each singletonMap in map1 there is a singletonMap in map2 with the same value for @to and @from:

     <xsl:function name="ct:equal" as="xs:boolean">
         <xsl:param name="map1" as="element(map)"/>
         <xsl:param name="map2" as="element(map)"/>

             <xsl:when test="count($map1/*) ne count($map2/*)"><xsl:value-of select="false()" /></xsl:when>
                 <xsl:variable name="result">
                     <xsl:for-each select="$map1/singletonMap">
                         <xsl:variable name="here" select="." />
                         <xsl:if test="$map2/singletonMap[@from eq $here/@from and @to ne $here/@to]">false</xsl:if>

                     <xsl:when test="contains($result, 'false')"><xsl:value-of select="false()" /></xsl:when>
                     <xsl:otherwise><xsl:value-of select="true()" /></xsl:otherwise>



-- Gerrit Imsieke Geschdftsf|hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler

Current Thread