[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Max size?


Subject: Re: [xsl] Max size?
From: Joseph Kesselman <keshlam@xxxxxxxxxx>
Date: Thu, 9 Jan 2003 11:05:44 -0500

>> Xalan is capable of "streaming processing".
>The interesting challenge is to work out when you can discard parts of
>the tree that won't be needed again. I think this could be done quite
>easily for a small class of very simple stylesheets, but the general
>problem is quite hard.

That's been our conclusion. XSLT's semantics require at least the 
appearance of having the whole document in memory at once. Figuring out 
how to reduce


The terminology Xalan uses for these issues:


Incremental: We can build the source model "on demand" (eg, by 
"throttling" the incoming SAX stream, or by using Xerces incremental 
parsing). If your stylesheet doesn't need to examine the whole source 
document, this can reduce the resources required. It does have some 
throughput costs. IMPLEMENTED; optional due to the performance trade-off.

Streaming: In incremental mode, we can begin generating output before the 
entire source document has been read. This reduces latency, which can be a 
major advantage when the next stage of processing (eg a browser) can 
itself operate in a streaming mode and begin displaying data immediately; 
the user sees the system as more responsive despite the throughput costs. 
IMPLEMENTED.

Filtering: An optimization consisting of not building portions of the 
source model which stylesheet analysis proves will never be referenced by 
the stylesheet. Conceptually straightforward, but runs into the "stopping 
problem" to some extent; may be hard to apply generally. May require some 
rewriting of the stylesheet and/or retaining of "stub" branches of the 
tree to avoid breaking XPaths. NOT IMPLEMENTED at this time.

Pruning: An optimization consisting of discarding portions of the source 
model which  stylesheet analysis proves will never again be referenced by 
the stylesheet. Similar issues to filtering. Some of the optimizations in 
our internal data structures (DTM) fight with this approach; as currently 
implemented, DTM can be "tail pruned" fairly easily (remove the most 
recently added subtree) but general pruning is challenging. We currently 
do some tail-pruning to manage RTF/Temporary Tree storage, but pruning of 
the source document is NOT IMPLEMENTED at this time.

______________________________________
Joe Kesselman  / IBM Research


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread