Page 1 of 1

XSLT transformation as batch operation - performance

Posted: Fri Jun 23, 2017 10:59 pm
by michaelmh
Hi all,

I have read the topics in the thread "XSLT in batch" and I was glad to learn something about XML Refactoring.

We are building some DTD migration XSLT which ultimately must be applied to thousands of XML documents distributed into a folder structure. So, right-clicking the folder and the Transform > Transform with command is an easy solution – at least for testing the XSLT on a certain number of files. (BTW, we can not use {$cf} as a target, because we need all files intact for access during the processing of later files.)

I am somewhat not satisfied with the performance of this batch transformation. As I am under the impression that Oxygen parses the XSLT new for every file found. Is that correct?

Are there ways to cache the parsed XSLT?
Or, are there ways using Saxon EE to create and execute an SEF file?
Or, do you have other suggestion for increasing batch performance? (Apart from making sure the XSLT is well-written.)

Before I have one of our developers write a simple transformation runner Java application, I am wondering if I already get the maximum performance from Oxygen or if there are options to enhance this. We would not blame OxygenXML if this use case is not supported, I just don't want to miss a feature. There is so much to learn…

Thanks a lot for your time,

- Michael

Re: XSLT transformation as batch operation - performance

Posted: Mon Jun 26, 2017 9:10 am
by Radu
Hi Michael,

Indeed Oxygen does not reuse the parsed XSLT stylesheet between transformations. So the XSLT is parsed and compiled each time. We had some plans to reuse the parsed XSLT and I will try to increase the internal issue's priority based on your feedback.
About this remark:
Or, do you have other suggestion for increasing batch performance? (Apart from making sure the XSLT is well-written.)
Maybe your XSLT could read itself the XML documents from the folder using the XSLT 2.0 "collection" function and then use the "result-document" functionality to process and write them to disk. So the XSLT itself would do the batch processing operation.
Or indeed compile the XSLT to an "SEF" file (in Oxygen 19.0 you can find a specific action for this in the Tools menu).
And you can use the "sef" file as the "XSLT" file in the transformation scenario dialog, it should work when transforming with Saxon PE or EE. I'm not sure about the performance increase but you could try this.
Or you could try to create an ANT build file and use the <xslt> task to convert each of the XML documents:

https://ant.apache.org/manual/Tasks/style.html

From what I remember the <xslt> task by default reuses the XSLT transformer between runs.

Regards,
Radu