[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] memory usage of xslt processing


Subject: Re: [xsl] memory usage of xslt processing
From: Thomas Porschberg <thomas.porschberg@xxxxxxxxx>
Date: Thu, 20 Apr 2006 06:32:11 +0200

Am Wed, 19 Apr 2006 13:59:08 +0100
schrieb "Michael Kay" <mike@xxxxxxxxxxxx>:

> XSLT processors generally read the whole document into memory. Some
> products may be able to avoid this under certain circumstances, for
> example see
> http://www.saxonica.com/documentation/sourcedocs/serial.html for
> Saxon.
I have to use Xalan and I heard of "SQL extensions". I have to try it
out.
> 
> Running one transformation per row is certainly feasible in principle
> though there may be a significant start-up overhead - you'll only
> find out by measurement.

Yes, but http://randspringer.de/sax_row.tar gives me an error currently.
And it is "ugly" because I have to provide the header by myself.

> 
> Alternatively, why not retrieve the data from the database in
> transformer-sized chunks?

It does not remove the problem with the header. Of course it should be 
faster to call stylesheet processing for multiple rows instead for
a single row.

As next step I will have a look at
http://stx.sourceforge.net/ and http://joost.sourceforge.net/.

Thank you,
Thomas

> 
> Michael Kay
> http://www.saxonica.com/ 
> 
> > -----Original Message-----
> > From: Thomas Porschberg [mailto:thomas.porschberg@xxxxxxxxx] 
> > Sent: 19 April 2006 13:36
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: [xsl] memory usage of xslt processing
> > 
> > Hi,
> > 
> > I have the following task:
> > Create an arbitrary formatted file (XML/HTML/CSV whatever) 
> > based on a Select from a database.
> > 
> > As a constraint the amount of data fetched from the database 
> > can not be stored in memory as a whole.
> > Another constraint is that I can not use XML-functionality in 
> > the database, I have to implement the functionality on top of 
> > our database access framework. This database access framework 
> > fetches record for record one after another.
> > And I have to use Java and Xalan.
> > 
> > My idea was to decorate every fetched row from the database 
> > with simple generic XML and fire this to Xalan.
> > 
> > Let do an example:
> > If my result set from the database looks like:
> > 
> > ID  Name  Description
> > --  ----  -----------
> > 1  "dog"  "an animal may be dangerous"
> > 2  "cat"  "an animal likes milk"
> > 
> > I create the following XML:
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <dataset>
> >  <row>
> >   <value>1</value>
> >   <value>dog</value>
> >   <value>an animal may be dangerous</value>  </row>  <row>
> >   <value>2</value>
> >   <value>cat</value>
> >   <value>an animal likes milk</value>
> >  </row>
> > </dataset>
> > 
> > I create this XML as "Sax fire events" in an java 
> > class[StringArrayXMLReader], which implements the 
> > org.xml.sax.XMLReader interface.
> > I have three methods:
> > 
> > public void init() throws SAXException {
> >         ch.startDocument(  );
> >         ch.startElement("","dataset","dataset",EMPTY_ATTR);
> > }
> > 
> > public void close() throws SAXException {
> >         ch.endElement("","dataset","dataset");
> >         ch.endDocument(  );
> > }
> > 
> > public void parse(String [] input) throws SAXException {
> >         ch.startElement("","row","row",EMPTY_ATTR);
> >         for (int i = 0; i< input.length; ++i){
> >            ch.startElement("","value","value",EMPTY_ATTR);
> >            ch.characters(input[i].toCharArray(), 
> > 0,input[i].length(  ));
> >            ch.endElement("","value","value");
> >        }
> >        ch.endElement("","row","row");
> > }
> > 
> > The parse method creates the <row>...</row> entries for an 
> > overhanded String array.
> > The StringArrayXMLReader is associated with a 
> > TransformerHandler, which uses a XSL stylesheet to transform 
> > the XML to the desired output.
> > 
> > What happens here is, that when the fetch from the database 
> > starts I call init() ( and thus startDocument() ) and at 
> > last, after the fetch finished, I call close() (and thus 
> > endDocument()).
> > I observed that the xslt processing starts when endDocument() 
> > is called.
> > This is not acceptable for me because I fear the xslt 
> > processor reads all the rows into memory until endDocument() 
> > is called and in this case I take a risk to run in OutOfMemory.
> > 
> > My second idea was to eliminate the init()/close() methods 
> > and to consider one <row>...</row> section as complete 
> > document input for the processor. This has the disadvantage 
> > that I have to create the head and tail of the document 
> > manually (and in my example I get a NullPointerException when 
> > I the transformer is called twice).
> > 
> > I have the following questions:
> > Is it possible to create the output without having the whole 
> > data in memory ?
> > The basis XML for xslt processing
> > <dataset>
> >   <row><value>...
> >   <row><value>...
> > </dataset>
> > looks very simple and the supplied XLS stylesheets will be 
> > not complex so my hope is to get it working.
> > I also think that the task in general - produce formatted 
> > output from a potential very large data pool - should be a common
> > one. Unfortunately I did not do much xslt-processing in the past 
> > so I lack the experience (a bit libxslt which I feed a DOM tree). 
> > If someone has some striking links I would very glad to hear. 
> > My test code I provide at:
> > 
> > http://randspringer.de/sax_row.tar and
> > http://randspringer.de/sax.tar
> > 
> > If someone could have a look at it I would really appreciate it.
> > 
> > Thomas
> > 
> > 
> > -- 
> 
> 
> 


-- 


Current Thread
Keywords