Validation performance: Schema cf. Schematron.
Oxygen general issues.
			- 
				snewton
- Posts: 8
- Joined: Tue Aug 16, 2011 4:23 pm
Validation performance: Schema cf. Schematron.
Using oXygen XML Editor v14, validating 5000 files in one project (all <1MB, most <0.1MB) against an RNC Schema took <4 min, whereas validating the same set of files against a Schematron file took nearly an hour. The RNC has ~300 elements and many attributes; the Schematron has ~150 <rule>s. I did this exercise as colleagues wondered whether continually increasing the number of Schematron rules we use was having a detrimental effect on validation performance. This test was my first attempt at investigating... perhaps it's not a well-designed test.
I don't really have an appreciation of what is going on behind the scenes, but to me this seems a large difference (or rather, I never realised until I did this exercise how slow validating against Schematron rules is relative to validating against a Schema, since I've only ever validated single files before). Please could someone explain in relatively simple terms what is happening here, and whether there is some obvious way of speeding up Schematron validation?
Many thanks indeed - Simon.
			
			
									
									
						I don't really have an appreciation of what is going on behind the scenes, but to me this seems a large difference (or rather, I never realised until I did this exercise how slow validating against Schematron rules is relative to validating against a Schema, since I've only ever validated single files before). Please could someone explain in relatively simple terms what is happening here, and whether there is some obvious way of speeding up Schematron validation?
Many thanks indeed - Simon.
- 
				sorin_ristache
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: Validation performance: Schema cf. Schematron.
Post by sorin_ristache »
Hello,
The total processing time depends on the complexity of the XPath expressions from the Schematron schema. Such a schema is applied to the XML instances by first compiling it into an XSLT stylesheet and then running an XSLT transformation based on this intermediary stylesheet. So one obvious way of speeding up the validation is some XPath expression optimization techniques.
The size of the XML files also weighs heavily on the overall processing time since an in-memory DOM tree is built from each XML file (as opposed to merely SAX parsing in case of RELAX NG validation). I suspect in your case this is what accounts for the surprisingly large difference between the two types of validation. However I don't see any way around, since the size of each XML instance file is a given input constant for the whole validation process, and building the DOM tree is a must. A possible improvement would be trying to parse each XML instance in streaming mode (that is no DOM tree internally during the XSLT transformation phase), however that is more of a request for enhancement for a future version.
Regards,
Sorin
			
			
									
									
						The total processing time depends on the complexity of the XPath expressions from the Schematron schema. Such a schema is applied to the XML instances by first compiling it into an XSLT stylesheet and then running an XSLT transformation based on this intermediary stylesheet. So one obvious way of speeding up the validation is some XPath expression optimization techniques.
The size of the XML files also weighs heavily on the overall processing time since an in-memory DOM tree is built from each XML file (as opposed to merely SAX parsing in case of RELAX NG validation). I suspect in your case this is what accounts for the surprisingly large difference between the two types of validation. However I don't see any way around, since the size of each XML instance file is a given input constant for the whole validation process, and building the DOM tree is a must. A possible improvement would be trying to parse each XML instance in streaming mode (that is no DOM tree internally during the XSLT transformation phase), however that is more of a request for enhancement for a future version.
Regards,
Sorin
			
				Jump to
				
			
		
			
			
	
	- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service