[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Hello,
(2) lay out the first couple of screenfuls of the document nicely, then
display a [More...] button to get the rest. Usually when I'm viewing XML
in a browser I just want to see what it looks like in general, or else the datum I'm looking for is in the first page or two.
But actually, I don't need the answer to that question. This is a
Firefox-specific application, so for practical purposes, all that really matters here is how fast does Firefox's processor (Trax) run the stylesheet? Well, there's one way to find out.
[1] Or in pathological cases, it could be O(s^2) where s is the size of the input document. I think.
[2] This version gives a broader area to click on to collapse/expand an element, and also provides a button to collapse/expand all children.
[xsl] efficiently detecting size of input document?
Subject: [xsl] efficiently detecting size of input document? From: Lars Huttar <huttarl@xxxxxxxxx> Date: Wed, 13 Aug 2008 15:48:07 -0500 |
Hello,
This is a basically solved question... amazing how you find answers when you're composing an email to this mailing list!
Maybe somebody working on a similar problem will find this helpful. And I'd be interested in any suggestions for improvement.
I've been working on the default XSLT stylesheet that Firefox uses to display an XML document to the user. I have something that provides a couple of handy extra features[2], but it's relatively slow. That's no problem for a 5KB XML document, but if you visit a URL and it turns out to be a 10MB document, Firefox basically becomes unresponsive and starts allocating large chunks of memory until it has processed the whole document. The strategies I've thought of for fixing this are either:
(1) detect the doc size in advance, and if it's large, call a template that presents a very no-frills view of the document; or (preferably)
(2) lay out the first couple of screenfuls of the document nicely, then
display a [More...] button to get the rest. Usually when I'm viewing XML
in a browser I just want to see what it looks like in general, or else the datum I'm looking for is in the first page or two.
For (1), the difficulty is, I don't know how to efficiently detect the doc size in XSLT. Sure you could count(//*) or even sum string-length(), but by the time you were done counting you would already have lost any time savings you hoped to gain. I would imagine that since the input could be a SAX stream rather than a file, there really is no quick way to predetermine the document size that is guaranteed to be available.
For (2), the challenge would be how to process the first [certain amount] of the document and no more, again without inefficiently counting nodes.
One idea was to use <xsl:number level="any">, something like <xsl:variable name="element-num"><xsl:number count="*" level="any" /></xsl:variable> <xsl:if test="$element-num < $element-limit"> ... <!-- process this node and its descendants -->
Depending on how xsl:number is implemented, this might be efficient; or it might be O(n^2), where n is the element-limit.[1]
The way it's described in the XSLT Programmer's Reference (2nd ed), it sounds like O(n^2): "Starting at the current node, the processor walks backwards through the document in reverse document order, counting the number of nodes that match the count pattern..."
I realize that this description is probably meant to specify the semantics of xsl:number, not its implementation or performance characteristics. So XSLT processors might not implement xsl:number such that each invocation takes time proportional to the number of preceding nodes. But for a statement as general as xsl:number, with arbitrary count and from attributes, how likely is a processor to be smart enough to implement my use case efficiently? It doesn't seem trivial.
But actually, I don't need the answer to that question. This is a
Firefox-specific application, so for practical purposes, all that really matters here is how fast does Firefox's processor (Trax) run the stylesheet? Well, there's one way to find out.
OK, I tried it, and it's not too bad! The 10MB document took about 3 sec to render the first 1000 elements. I think that's acceptable!
One might hope that a similar stylesheet could be used for prettyprinting XML efficiently in other environments... but the present problem is solved to my satisfaction.
Regards, Lars
[1] Or in pathological cases, it could be O(s^2) where s is the size of the input document. I think.
[2] This version gives a broader area to click on to collapse/expand an element, and also provides a button to collapse/expand all children.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Re: XPath Against OpenXML and, Laura Porter | Thread | Re: [xsl] efficiently detecting siz, Andrew Welch |
RE: [xsl] Sorting, but then splitti, Dan Acuff | Date | RE: [xsl] Sorting, but then splitti, Michael Kay |
Month |