[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] GByte Transforms


Subject: Re: [xsl] GByte Transforms
From: Jeff Kenton <jeffrey.kenton@xxxxxxxxxxx>
Date: Wed, 02 Jun 2004 16:40:05 -0400


Some thoughts below:


Kevin Jones wrote:

...
The quest then is to find ways of writing stylesheets for these types of transform that give predictable performance results. It would be ideal for a processor to handle this without any fuss but turning an arbitary stylesheet into one that executes in linear time will never be viable for XSLT. Some of our more practical ideas we have been kicking around are: -

Ignore the problem, leave to stylesheet writer testing.

Can't do this. People do the strangest things in stylesheets, as any reader of this list knows. Your job is to take anything a customer might throw your way, no matter how weird, and "do the right thing".



Extra smarts in the compiler to warn of the use of potentially non-linear behaviour. E.G. Recursive templates not being tail recursive, nested loop/
template constructions.


As above but aided by structural information for better targeting.

Sure, the more diagnostics for the user, the better. But be prepared for users ignoring them, and customers that set things up so that users never see the warnings.



As above but with automatic re-writing where possible.

This will help. You can rewrite backwards axes, and certain predicates, so that you can process the input in a single forward pass. (I'm sure you know this already.) Some other axes, and expressions that start at the root of the input document, can't be done that way.



Optional runtime monitoring for non-linear behaviours, perhaps as an addition to profile information.


Runtime stop limits, E.G. if predicted execution (as monitored by the runtime) time exceeds a limit terminate.

Non-linear algorithm replacement with linear but slower algorithms, applies to both runtime and compiler.

Subset XSLT to limit the scope for non-linear transforms.

It often comes down to that. The other way to look at the problem of "streaming" large input files is to analyze the stylesheet and try to decide how much of the input you need to keep during processing. For some operations, only the current node is necessary. More often, keeping just the path from the root to the current node will work (as another poster suggested). Sometimes, you need the entire input tree, and you're not really "streaming" anymore. Consider it a continuum, rather than just a binary "can I stream this stylesheet or not" question.



And perhaps the most controversial, don't provide this support for XSLT but only XQuery where predictability should be better.

To me, this isn't so controversial, but there are two sides. XQuery is certainly more predictable, so this has merit. But a lot of the same problems still exist in theory, even if they're less likely to be seen in practice. No reason to give up on XSLT. Furthermore, most huge input files have a very regular structure, and should be suitable for simple stylesheets. It may go back to educating your users, even though I disparaged that above, as long as you provide the best tools you can to help them.


jeff




--


-------------------------------------------------------------------------
=    Jeff Kenton      Consulting and software development               =
=                     http://home.comcast.net/~jeffrey.kenton           =
-------------------------------------------------------------------------





Current Thread
Keywords