[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Finding the maximun number of nodes


Subject: Re: [xsl] Finding the maximun number of nodes
From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
Date: Mon, 8 Jan 2001 10:45:44 +0000

Hi Dimitre,

> My algorithm is quite different -- at each step (recursive call to
> traverse the rest of the nodeset) *** only the really necessary
> nodes*** are considered -- not "all the rest".

I might be missing something, but I still maintain that you cannot
work out which nodes are the necessary nodes without looking at nodes
that are unnecessary. I think the crux is:

>   <xsl:variable name="nextNodes"
>     select="$theNodes[position() > 1]
>                  [$myNumChildren &lt; count(*[name()=$childName])]"/>

which is where you're working out what the 'necessary nodes' are for
the next step. When setting this variable, each of the rest of
$theNodes are visited, and their children counted and compared to the
number of children of the first of $theNodes.

Just because these visits are hidden away in an XPath rather than in
an xsl:for-each doesn't mean that they don't happen. XPaths may (or
may not) be more efficient than an xsl:for-each, but they still
involve examining nodes. The above XPath will take longer for a larger
set of $theNodes than it would for a smaller set of $theNodes.

In fact, given a worst-case scenario of child node lengths (e.g. 1, 2,
3, 4), in the first visit to the template, the number of children for
the first node would be compared to the next three, and a node set of
the remaining three would become the $nextNodes.  When the template
was called next, the number of children of the second node would be
calculated (for a second time) and this value compared to the number
of children for the third and fourth nodes.  On the third visit, the
third node gets its children counted for a third time and so on.  I
can't work out what that means exactly, but it's not linear.

I had a similar goal (only visiting necessary nodes) when I came up
with the recursive solution that uses an XPath to locate the next
interesting node to move to. One thing that you can take advantage of
is the fact that XPaths that end with a [1] predicate are often
optimised to stop when they find the first matching node, and
therefore, as long as you move to that node next, you don't have to
visit any nodes more than once. I can't, though, immediately think of
a way to take advantage of this with an arbitrary node set (i.e. how
to get all the nodes that follow a node of interest in an arbitrary
node set).

I gave a solution for an arbitrary node set before; generalising and
altering it to make it more efficient and to match your solution more,
it's:

<xsl:template name="nodeWithMaxChildren">
  <xsl:param name="theNodes"/>
  <xsl:param name="childName"/>
        
  <xsl:variable name="myNumChildren"
    select="count($theNodes[1]/*[local-name()=$childName])"/>
         
  <xsl:choose>
    <xsl:when test="$theNodes[2]">

      <xsl:variable name="max">
        <xsl:call-template name="nodeWithMaxChildren">
          <xsl:with-param name="theNodes"
                          select="$theNodes[position() != 1]" />
          <xsl:with-param name="childName" select="$childName" />
        </xsl:call-template>
      </xsl:variable>

      <xsl:choose>
        <xsl:when test="$max &gt; $myNumChildren">
          <xsl:value-of select="$max" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="$myNumChildren" />
        </xsl:otherwise>
      </xsl:choose>
      
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$myNumChildren" />
    </xsl:otherwise>
  </xsl:choose>

</xsl:template>

The recursive depth of this will always be N, but on the other hand
you only work out the number of child nodes for each node once.  [BTW,
I've used local-name() rather than name() to avoid namespace
problems.]

> As for wishes for new features in the next versions of XSLT, it is
> obvious that the speed of many such algorithms will be considerably
> increased by using a modification of the key() function that returns
> a nodeset of all nodes with value ***greater than given value*** for
> a named key.

If there was a variety of key() that allowed you to retrieve all keyed
nodes (for example key() without the second argument), then you could
do things like:

  key('mykey')[. > $minValue]

to the same effect, but yes it would probably be more efficient if the
key value test could be done behind the scenes.  Something like:

  key('mykey', saxon:expression('. > $minValue'))

Adding this kind of functionality to key() would be useful for all
those people who want to retrieve nodes with key values that *start
with* something and so on, as well.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords