[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] use-when attribute?


Subject: Re: [xsl] use-when attribute?
From: Geert Josten <Geert.Josten@xxxxxxxxxxx>
Date: Sun, 19 Dec 2004 22:45:47 +0100

Saxon builds the index for each key when that key is first used against a
particular document. In general, you don't know which keys will be used for
which document, and there's no point building an index for a key that won't
be used.

Ah, knowing that makes a difference...


In any case, scanning the nodes is not expensive compared with evaluating
the pattern and the use expression, so it doesn't much matter whether it's
done in one scan or several. It's not like a database, where disc accesses
are expensive and need to me minimised.

Depends on the size of that dataset I would expect, but interesting to say that evaluating one expression, even a // one, is less expensive than matching patterns. Very interesting.. That opens lots of new perspectives..


Also, forming the union of three node-sets isn't necessarily expensive in a
pipelined implementation, especially if the path expressions that produce
the three node-sets automatically deliver results in sorted order. (This is
another reason for avoiding // - an expression of the form //a/b may need to
be sorted, whereas /*/a/b doesn't).

Well, interesting that you bring this up, but the case did include three times // and the result of all three was unioned.


Anyhow, I created a simple testcase where the input was like:

<test>
  <data>
    <a>AAAAAAAAA...</a>
    <b>BBBBBBBBB...</b>
    <c>CCCCCCCCC...</c>
  </data>
  ..
  (repeated until size was 10Mb)
  ..
</test>

It actually doesn't match the case of Bruce, where the matched elements were much more distributed throughout the document, but this testcase does clear up some things..

One stylesheet defines a global variable select="//a|//b|//c" and another similarly a key (or three with the same name) with match="a|b|c" use="'all'"..

The root template outputs a root node containing the count of either the global variable (test case 1) or of the key (test case 2).

It was very clear that test case 2 was ***MUCH*** slower, I even had to terminate the execution. The first executed within 3 seconds with msxsl, the other was terminated after it had still not ended after about 20 or 30 seconds.

Despite the explanation from Mike, I still don't _exactly_ understand why the key solution is so much slower in this case (it doesn't fit with other experiences with keys in which a reached a large gain by using them), but it is at least very obvious that it is _not_ wise to use a key with a fixed use pattern...


Thanks for the insights Mike...


Cheers,
Geert


Current Thread