[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] How to stream-process non-XML text using unparsed-text-lines( ) ?


Subject: Re: [xsl] How to stream-process non-XML text using unparsed-text-lines( ) ?
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 26 Jul 2014 18:46:31 -0000

> Is unparsed-text-lines(...) always streaming,

It depends on the processor and on the way the code is written, but I think
that was also what Michael Kay tried to point out. If you keep a reference to
the result of fn:unparsed-text-lines and you go back and forth through the
sequence, there is little chance that it will be streamed, but in the
xsl:for-each approach, and probably with a SimpleMapExpr, you can be quite
certain that it is streamed.

I.e., this is streamable, and effectively filters the set so you can work with
only those lines you are interested in:
unparsed-text-lines()!.[contains(., 'hello world')]

> or is it streaming only when the
> XSLT contains:
>
> 	<xsl:mode streamable="yes" />

Whether fn:unparsed-text or fn:unparsed-text-lines are streamable is
irrespective of the streamable="yes" mode (or, more generally, whether you are
in a streamable context or not).  The reason is subtle: those functions return
strings, not a node set, and streamable rules only apply to streamed nodes.

We have considered adding rules specifically for this situation, but it proved
too complex and/or not worth the effort (can't really recall), because the
use-cases are already covered in the definition of fn:unparsed-text-lines and
the freedom it allows processors to implement it.

>
> The below XSLT program works great. Would it work differently if I were to
> remove the xsl:mode? How do I know that the input is being streamed?

Yes, it would work the same. And the only sure way that I know of to find out
whether the input is actually being streamed is by using a large input
document and monitoring the use of memory. However, even if you do see a lot
of memory consumed, it may still use streaming, because processors might
consider a certain buffering, potentially to the size of available memory,
beneficial in a particular scenario. To cancel that out, the input file must
be larger than available memory.

Note: if you want a sure-fire cross-processor way of streaming text, you can
also create a UriResolver that returns a text document as an XML document with
a (large) set of elements each containing one line or record of the text
input. Then you can use xsl:stream and create a guaranteed streamable
stylesheet (which turns out to be relatively simple, as your input will be
flat and each time you select a text-node, the streamability rules will take
into account that it cannot contain children, so the rules are a bit more
lenient).

Cheers,

Abel Braaksma
Exselt XSLT 3.0 streaming processor
http://exselt.net


Current Thread
Keywords