Page 1 of 1
Sloooow XPath evaluation with large files
Posted: Wed May 26, 2004 12:51 am
by dsewell
oXygen takes a very long time to evaluate and return an XPath expression for large files. For example, with a 1.3 MB file it takes nearly a minute to evaluate //body in a TEI document, on a dual G4 Power Mac. The same XPath query takes under a second to process using the GNU libxslt2.
Is this just a function of the Java XML parser? Can anything be done to improve performance?
Posted: Thu May 27, 2004 8:13 pm
by george
Hi David,
Please let us know more details.
Do you see the same delay when you run an XPath like /* ?
How many results do you get ?
What is the size of your document ?
Also check to have the TEI catalog set.
Thanks,
George
More details
Posted: Thu May 27, 2004 9:02 pm
by dsewell
There are delays with any XPath expression. The ones that return a lot of results are somewhat slower than an expression that returns a single node or only a few nodes. Or if I enter an XPath expression that points to a nonexistent node like /foo/bar/baz, it also takes a very long time to return a null result. For example, about 75 seconds to return results on an XML file with a size of 1438094 bytes. For a comparison, if I write an XSLT script to return each /foo/bar/baz, the transformation using Saxon takes about 6 seconds on my system.
I do have the TEI catalog set in my preferences. But there is no difference in performance if I remove the DOCTYPE declaration and do the same search on the file.
Posted: Fri May 28, 2004 10:50 am
by george
Hi David,
For a 1M document I get the result of /foo/bar/baz in about 2 seconds. Can you zip and send to
support@oxygenxml.com a document to see if we get the same results here ? If this is not possible let us know and we will poit you to some document so we can run similar tests.
We are using the Xalan XPath API - what time do you get if you run the XSLT script with Xalan ?
Best Regards,
George
Posted: Fri May 28, 2004 6:57 pm
by dsewell
George -- I will email a file to support.
If I use the Xalan transformer in the XSLT configuration instead of the Saxon, it processes the XSLT script to return /foo/bar/baz in about 4 seconds (compared to 6 for Saxon).
The choice of XSLT transformer should not affect the behavior of the XPath toolbar search, should it?
Posted: Wed Jun 09, 2004 8:13 pm
by george
(Just to update the forum entry)
The longer time compared with running a stylesheet is due to setting some properties on the Transformer when running the XPath query to make it report location information needed to locate the result hits in the document.
The solution is a medium term one and implies rewriting the XPath support using Saxon instead of Xalan.
The choice of the XSLT transformer engine does not affect the XPath execution time.
Best Regards,
George