Performance issue when using “following-sibling“
Here should go questions about transforming XML with XSLT and FOP.
-
- Posts: 24
- Joined: Thu Jan 07, 2021 10:44 am
Performance issue when using “following-sibling“
Post by DanOvergaard »
Hi,
I am facing a performance issue with a XSLT test using “following-sibling“ and I would like to know if anybody has a suggestion to optimize the test.
The problem is that the test-time increase exponential with the number of lines and some of the XML-files have 80.000 or more lines.
I have attached XSLT fil with a test using “following-sibling“ (Original test) and also a “for-each-group” test to see the performance, but it’s as bad as using “following-sibling“.
The test is used to make sure that each line identifier is unique (cbc:ID)
Any comments and suggestion are appreciated.
Rgds,
Dan
I am facing a performance issue with a XSLT test using “following-sibling“ and I would like to know if anybody has a suggestion to optimize the test.
The problem is that the test-time increase exponential with the number of lines and some of the XML-files have 80.000 or more lines.
I have attached XSLT fil with a test using “following-sibling“ (Original test) and also a “for-each-group” test to see the performance, but it’s as bad as using “following-sibling“.
The test is used to make sure that each line identifier is unique (cbc:ID)
Any comments and suggestion are appreciated.
Rgds,
Dan
LargeTestFile.zip
MediumTestFile.zip
XSLT.zip
HugeTestFIle.zip
You do not have the required permissions to view the files attached to this post.
-
- Posts: 102
- Joined: Tue Aug 19, 2014 12:04 pm
Re: Performance issue when using “following-sibling“
Post by Martin Honnen »
Can you tell us which XSLT processor you are using to run the tests? Is that Saxon 12? And if so, will it be the HE edition or can you also use the EE edition provided in oXygen?
I would certainly think that doing e.g.
should perform way better than your code using following-sibling and preceding-sibling repeatedly.
If you can use EE I would also suggest to try for large data sets whether XSLT 3 streaming (capturing the ID in a map, for instance) is not way faster than doing traditional tree based processing navigating sibling axes.
I would certainly think that doing e.g.
Code: Select all
<xsl:template match="doc:Catalogue">
<xsl:for-each-group select="cac:CatalogueLine" group-by="cbc:ID">
<xsl:if test="current-group()[2]">
<Error context="{name(parent::*)}/{name()}">
<Pattern>current-group()[2]</Pattern>
<Description>[F-CAT248] CatalogueLine.ID must be unique within the document instance</Description>
<Duplicated-ID>{current-grouping-key()}</Duplicated-ID>
<XPath>{path()}</XPath>
</Error>
</xsl:if>
</xsl:for-each-group>
</xsl:template>
If you can use EE I would also suggest to try for large data sets whether XSLT 3 streaming (capturing the ID in a map, for instance) is not way faster than doing traditional tree based processing navigating sibling axes.
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service