[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Approach to transform 250GB xml data

Subject: Re: [xsl] Approach to transform 250GB xml data
From: "Abel Braaksma (Exselt) abel@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 10 Sep 2014 15:14:39 -0000


> others are impossible (e.g. sorting).

I beg to differ. At XML London 2014, I demonstrated an approach to use a
two-pass method with slicing and then xsl:merge, to do a streamable sort.
Indeed, it is impossible with just standard features to do it in one pass, but
still, in terms of "possible" or "impossible", with just XSLT and the
willingness to do it in just two passes (which I believe is not much for such
a complex task), you can.

And in a situation that the number of required slices can be determined
statically and can be selected (i.e., as in selecting all lines starting with
"a" or "b"), and are manageable in memory, you could conjure up a solution
with xsl:fork and so it in one pass (assuming your processor supports xsl:fork
the way it is intended, the spec does not require a one-pass approach there
and I doubt it even possible in all streaming scenarios).


Abel Braaksma
Exselt XSLT 3.0 streaming processor

> -----Original Message-----
> From: Michael Kay mike@xxxxxxxxxxxx [mailto:xsl-list-
> service@xxxxxxxxxxxxxxxxxxxxxx]
> Sent: Wednesday, September 10, 2014 10:12 AM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Approach to transform 250GB xml data
> It is not practical to transform this using XSLT except by use of a
> XSLT processor such as Saxon-EE, and even then it depends on the detailed
> nature of the transformation to be performed. Some transformations are
> readily streamed (e.g. renaming all the elements), others are impossible
> sorting). Tell us more about what the transformation is doing.
> Michael Kay
> Saxonica
> mike@xxxxxxxxxxxx
> +44 (0) 118 946 5893
> On 10 Sep 2014, at 08:36, Vishnu vishnu@xxxxxxxxxxxx <xsl-list-
> service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > Hi,
> >
> > I have approx 250GB xml data and I want to transform it using XSLT 2.0.
> What should be the best approach to transform this database.
> >
> > I tried it with ANT but it gave me JAVA heap space error message.
> >
> > Please suggest.
> >
> > Thanks!
> >
> > Vishnu Singh
> > "This e-mail and any attachments transmitted with it are for the sole use
> the intended recipient(s) and may contain confidential , proprietary or
> privileged information. If you are not the intended recipient, please
> the sender by reply e-mail and destroy all copies of the original message.
> unauthorized review, use, disclosure, dissemination, forwarding, printing
> copying of this e-mail or any action taken in reliance on this e-mail is
> prohibited and may be unlawful."

Current Thread