[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Computational complexity of accessing the Nth item in a sequence and in a node-set


Subject: RE: [xsl] Computational complexity of accessing the Nth item in a sequence and in a node-set
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 4 Jan 2005 10:23:32 -0000

> This prompts a few other questions I've had for some time:
> 
> When a sequence is passed as a parameter, what is actually passed --
> its copy or a reference to the sequence?

A reference.
> 
> If there's actually such thing as a reference to a sequence, then why
> should calculating the hash key of this reference take longer than
> scanning through many items until the last (which may be thousands of
> items away)?

You're right that for the purposes of memo functions, it would probably make
sense to re-use the previous results only when the "same sequence" is
passed, rather than an "equal sequence". I think the only reason that
doesn't happen today is habit: there is no concept of sequence identity in
the language semantics.
> 
> Are subsequences shared so that when an item is added at either end
> the original sequence is not copied, but a reference to itself is
> used?
> 
If a sequence is materialized, then it is usually materialized in full.
However, Saxon goes to great lengths to avoid materializing a sequence in
memory. The cases where a sequence is materialized include when it is
sorted, or when it is held in a variable to which there is more than one
reference. A sequence represented intensionally as "subsequence(A, 2, 5)" or
"(A, '12')" (referred to as a closure) is sharing storage in the way you
describe, but as an expression tree rather than a data structure with
explicit references.

There is some special-casing for head-tail recursion: an intensional
sequence of the form
subsequence(subsequence(A, 2), 2) is collapsed into the form subsequence(A,
3), to avoid deep chains of closures being created. In other cases, if the
chain of closures exceeds a certain depth, the sequence is materialized in
memory.

There's always more scope to refine the rules further, of course. For
example, there are cases where the value of a variable should ideally be
evaluated anew on each reference rather than being stored in memory, for
example if it's a simple concatenation like (A, B). Meanwhile, if you want
repeated evaluation, use a function rather than a variable.

Michael Kay
http://www.saxonica.com/


Current Thread