[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] "Heap" of trouble handling input file of 500 MByte


Subject: [xsl] "Heap" of trouble handling input file of 500 MByte
From: thehulk@xxxxxxxxxxx
Date: Sat, 19 Feb 2011 19:47:22 +0000 (UTC)

Hello,

Thanks mainly to this list, I am successfully processing 6,335 of my 6,337 input files. The 6,335 are under 250 MByte each. The two problem cases are each just under 500 MByte. 

Are there any tips or tricks or tools which will make this possible on my 32-bit Windows XP SP3 machine? 

I am using Java code and the Javax.xml.* classes to do the transform. The main piece of Executor.java is:

// Prepare the transformer factory
javax.xml.transform.TransformerFactory transFact = javax.xml.transform.TransformerFactory.newInstance();
// Prepare an xsl source for the transformer
javax.xml.transform.Source xsltSource = new javax.xml.transform.stream.StreamSource( xslFile);
// Make the Transformer
javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource);
.....
//Prepare the transformation input and output
File xmlFileInput = new File(folder + xmlfilename);
File xmlFileOutput = new File(targetFolder
 + addedPrefix + xmlfilename);
javax.xml.transform.Source  xmlSource = new javax.xml.transform.stream.StreamSource(xmlFileInput);
javax.xml.transform.Result   xmlResult   = new javax.xml.transform.stream.StreamResult(xmlFileOutput);
// Do the transform
trans.transform(xmlSource, xmlResult);
						
The 6,335 process fine within Eclipse 3.61 with a VM of one GByte, using  "-Xms1024m
 -Xmx1024m"  in the Run Configuration. 

The same Java class also runs fine (in 6,335 cases) from Windows XP SP3 command line: 
  java Executor myArguments  -Xms1024m -Xmx1024m

I have increased the VM at the command line, in steps, going as high as 4 GB, 
	  java Executor myArguments  -Xms4096m -Xmx4096m

but regardless, the two big files provoke:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.util.Arrays.copyOf(Unknown Source)
        at java.util.Vector.ensureCapacityHelper(Unknown Source)
        at java.util.Vector.addElement(Unknown Source)
        at com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.startElement(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.startElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
        at Executor.main(Executor.java:183)

What to try next?

Is it really a lack of heap space? I imagine that heap requirements are proportional to file size, but then a 3 GByte VM should work. 

Could the problem relate to the XSL code itself, which is very brief and does a sort-while-copying operation?

TIA!


Current Thread
Keywords
xsl