[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] How to stream-process non-XML text using unparsed-text-lines( ) ?

Subject: Re: [xsl] How to stream-process non-XML text using unparsed-text-lines( ) ?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 25 Jul 2014 07:09:54 -0000

On Thu, Jul 24, 2014 at 10:07 AM, <mike@xxxxxxxxxxxx> wrote:
let $x := unparsed-text-lines(...)

return ($x[1000], $x[10])

Saxon implements $x using a data structure called a MemoClosure where items
from the input are read on-demand, and then remembered. Reading $x[1000]
will cause the first 1000 items to be read and retained in memory; reading
$x[10] finds that the tenth item is already in memory

Maybe not store the 1000 lines, except the last, but just create a map N --> offset(N) for the Nth line, N < 1000

The current mechanism isn't specialized to unparsed-text-lines(), it works on any iterable XPath expression (which means nearly any XPath expression). I certainly wouldn't want to create a mechanism specific to this particular case.

I have been doing things to make the StringValue object used to hold each of the strings a more lightweight object, as a result of an exercise to reduce the amount of memory used to enforce key/keyref constraints in XSD when validating a gigabyte-sized instance document, and these savings will affect many other use cases.

Michael Kay

Current Thread