[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Streaming and mapping plain text


Subject: [xsl] Streaming and mapping plain text
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Tue, 17 Sep 2013 15:25:09 -0400

Hi,

Like Roger, I have some questions about streaming in XSLT 3.0.

Consider the problem of the classic mapping of CSV into XML. Assume we
have files over 1GB in size, so we wish to stream.

Assume also that the lines in the CSV input need to be grouped --
outputs will be XMLs containing data sets from adjacent lines, based
on common values in a designatedd field in those lines. But which
field this is has to be parameterized, because not every CSV input
will have this cell in the same place in the row.

We can easily map each line to a sequence of cell elements:

<line><cell>1</cell><cell>2</cell><cell>3</cell></line>

Since we know the mapping we wish to use, we can also mark the cell we
wish to use to group:

<line><keycell>1</keycell><cell>2</cell><cell>3</cell></line>

(Maybe next time the second cell, not the first, will be keycell.)

Then we can group-adjacent select="keycell" over the lines to collect
our sequences of lines.

My question is how can this be streamed most effectively?

If I can stream to a stream, maybe the best way is first to stream the
lines, with the mapping I need to generate XML, and then stream the
lines into the sequences of grouped lines.

If, however, I can only stream the plain text input through, and
cannot stream the lines I generate in my first pass (with cells
marked) into the second pass (to group the lines) then I need to
collect the lines first, based on group-adjacent not on the value of
'keycell' (which isn't known yet) but on (say)
tokenize(.,$delimiter)[$pos], where $pos is the position among the
cells of 'keycell' for this mapping.

Any advice or ideas would be welcome.

Thanks!
Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


Current Thread
Keywords