[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Does the count() function require access to the whole subtree?


Subject: Re: [xsl] Does the count() function require access to the whole subtree?
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 16 Jan 2014 11:44:19 +0000

On 16/01/2014 10:39, Costello, Roger L. wrote:
In explaining why count(//x) is streamable whereas data(//x) is not,
David Carlisle wrote:

count() just returns a single value and the system could
conceptually go through the document once counting every time it sees
x.

data(//x) is the same as //x/data(.) and returns the data of each
element in the sequence. Now the first element in the sequence is
the outer x. To work out its data the system has to process the
_full_ content of that element (which actually is the whole
document). Then the next element of the sequence is the inner x.
Oops, we passed that already, so you need to back up to get that.
This is why the concern about "overlapping trees" comes from.

Thanks David, that is an outstanding explanation.

Question: So, what is the general principle at work here?

I'll take a stab at answering that question:

Let's take as reference this XML:

<Document> <x> <x>A</x> B </x> </Document>

Consider an XPath expression that yields a sequence of <x> elements.
Now consider an operation on that sequence. Is the operation on the
sequence streamable or not?

For example, is the operation count() on the sequence generated by
the XPath expression //x streamable or not? Is the operation data()
on the sequence generated by //x streamable or not?

We must consider two cases:

Case 1: 	One or more items in the sequence has has an <x> element
nested inside another <x>.

If the operation can be performed just by inspecting each item of
the sequence, then the operation on the sequence is streamable.

If the operation requires going inside each item of the sequence,
then the operation on the sequence is not streamable.

Case 2: 	There are no items in the sequence that have an <x> element
nested inside another <x>. (Each item in the sequence is disjoint)

The operation on the sequence is streamable, regardless of whether
the operation just inspects each item or goes inside each item.

Is that correct?

Is it complete? Are there any cases that it misses?

Can you express it more simply and more clearly?

/Roger




It's basically correct but slightly more optimistic than it should be.


In case 2 for example rather than saying "operation on the sequence is
streamable" you should probably say "operation on the sequence is not
made not streamable because of the input sequence".

The above description is describing whether an operation that has a
chance of being streamable is streamable when applied to a sequence.

Some operations are intrinsically not streamable, if you replace data()
by reverse() for example then reverse(//x) is going to have to buffer
every node somewhere and return the x elements in reverse order so since
it has to return the last one first the operation can't be streamed in
any reasonable way irrespective of the input sequence, which happens to
be //x here. So nesting of x would not be relevant.

(Just answering from first principles, I suppose I should check the
spec and see what wording is used to categorise reverse.....
.... oh look there is a section 19.4 that says:

[Definition: An operand usage of navigation indicates that the
construct may navigate freely from the supplied node to other nodes
in the same tree, in a way that is not constrained by the
streamability rules.] This covers several cases: cases where it is
known that the construct performs impermissible navigation (for
example, the xsl:number instruction) or reordering (the reverseFO30
function),




David




________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________



Current Thread
Keywords