[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Normalize / Simplify HTML-Tables with row-span / col-span


Subject: RE: [xsl] Normalize / Simplify HTML-Tables with row-span / col-span
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 18 Feb 2004 17:36:48 -0000

The FragmentValue class in Saxon 6 is an implementation of the data
model that tries to optimize for the usage pattern of XSLT 1.0 result
tree fragments, including use as a "node-set" where that is required. It
does this by building the tree lazily, delaying its construction until a
tree is actually needed. The initial structure is essentially a list of
SAX events. If the user asks for the string value, this is available
directly. If the user does xsl:copy-of, the SAX events are played back
to append nodes to the new tree. Neither of these cause the tree to be
built. On certain other constructs, one of which is the node-set()
function, a tree is built locally. Once the tree has been built, there
is a bit to indicate whether generalized usage of the tree is permitted.
The node-set() function is not the only thing that forces construction
of a tree, though I don't recall in detail what the other conditions are
(Saxon 6 was a long time ago...). There are messy tests scattered all
around the code trying to avoid triggering the building of the tree,
e.g. when doing an equality comparison and when calling the document()
function (which is special because it needs to know the base URI), but
they don't cover all cases.

The performance bug that you refer to occurs during the initial
construction of the data structure that's there to avoid the costs of
building a generalized tree. If there are any lessons to be drawn from
it, it is the lesson (which I've certainly learnt over the years with
Saxon) that every optimization risks introducing bugs, and some of these
bugs aren't found for a long time. It's not quite true that optimization
is the root of all evil, but it's certainly something you have to
balance against reliability.

In Saxon 7 I scrapped all of this and use the same tree model as is used
for source trees. It's much simpler, faster, and more reliable.

Michael Kay

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> David Tolpin
> Sent: 18 February 2004 15:34
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] Normalize / Simplify HTML-Tables with 
> row-span / col-span
> 
> 
> > No, you've got it completely the wrong way around! Saxon uses a 
> > different data structure for RTFs from its normal tree 
> model, and it's 
> > this different data structure that has a bug in it!
> 
> Michael,
> 
> 1) it was your words that the only thing node-set does is 
> flipping a bit. 
> 
> 2) i provided the test that shows that the same bug occurs 
> whether node-set is used or not on an RTF.
> 
> 3) the code to build RTF creates a subclass of SingletonNodeSet, 
> FragmentValue; which is where the bug is. 
> 
> 4) exslt/Common.java contains the code which I think is used 
> to implement node-set, namely, method Common.nodeSet, which 
> only calls allowGeneralUse for instances of SingletonNodeSet; 
> the full text of this method is here:
> 
>     public void allowGeneralUse() {
>         generalUseAllowed = true; 
>     }
> 
> Where SingletonNodeSet, used to build RTF and represent 
> node-set after the call to nodeSet is converted to a 
> different structure? 
> 
> 
> > But you can't prove anything from bugs, especially 
> performance bugs. 
> > They can happen anywhere at any time. This one happened 
> because RTFs 
> > are usually small, which meant that a scalability problem 
> in this area 
> > didn't show up in all the standard performance tests; the 
> same bug in 
> > the tree model used for input documents would have shown up very 
> > quickly.
> 
> I am not proving, I am illustrating. And no, it didn't happen 
> because of incomplete or insufficient testing. It happened 
> because there is not a single word about required 
> computational complexity of XSLT constructs. If the spec said 
> that adding a node to RTF must not be more complex than O(n), 
> it would not happen.
> 
> David Tolpin
> http://davidashen.net/
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords