Validation performance: Schema cf. Schematron.

Oxygen general issues.
snewton
Posts: 8
Joined: Tue Aug 16, 2011 4:23 pm

Validation performance: Schema cf. Schematron.

Post by snewton »

Using oXygen XML Editor v14, validating 5000 files in one project (all <1MB, most <0.1MB) against an RNC Schema took <4 min, whereas validating the same set of files against a Schematron file took nearly an hour. The RNC has ~300 elements and many attributes; the Schematron has ~150 <rule>s. I did this exercise as colleagues wondered whether continually increasing the number of Schematron rules we use was having a detrimental effect on validation performance. This test was my first attempt at investigating... perhaps it's not a well-designed test.

I don't really have an appreciation of what is going on behind the scenes, but to me this seems a large difference (or rather, I never realised until I did this exercise how slow validating against Schematron rules is relative to validating against a Schema, since I've only ever validated single files before). Please could someone explain in relatively simple terms what is happening here, and whether there is some obvious way of speeding up Schematron validation?

Many thanks indeed - Simon.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Validation performance: Schema cf. Schematron.

Post by sorin_ristache »

Hello,

The total processing time depends on the complexity of the XPath expressions from the Schematron schema. Such a schema is applied to the XML instances by first compiling it into an XSLT stylesheet and then running an XSLT transformation based on this intermediary stylesheet. So one obvious way of speeding up the validation is some XPath expression optimization techniques.

The size of the XML files also weighs heavily on the overall processing time since an in-memory DOM tree is built from each XML file (as opposed to merely SAX parsing in case of RELAX NG validation). I suspect in your case this is what accounts for the surprisingly large difference between the two types of validation. However I don't see any way around, since the size of each XML instance file is a given input constant for the whole validation process, and building the DOM tree is a must. A possible improvement would be trying to parse each XML instance in streaming mode (that is no DOM tree internally during the XSLT transformation phase), however that is more of a request for enhancement for a future version.


Regards,
Sorin
snewton
Posts: 8
Joined: Tue Aug 16, 2011 4:23 pm

Re: Validation performance: Schema cf. Schematron.

Post by snewton »

Many thanks for your response, Sorin. I'll consult colleagues on what you have suggested.
Simon.
Post Reply