Page 1 of 1

Performance issue when using “following-sibling“

Posted: Tue Feb 20, 2024 12:48 pm
by DanOvergaard
Hi,

I am facing a performance issue with a XSLT test using “following-sibling“ and I would like to know if anybody has a suggestion to optimize the test.

The problem is that the test-time increase exponential with the number of lines and some of the XML-files have 80.000 or more lines.

I have attached XSLT fil with a test using “following-sibling“ (Original test) and also a “for-each-group” test to see the performance, but it’s as bad as using “following-sibling“.

The test is used to make sure that each line identifier is unique (cbc:ID)

Any comments and suggestion are appreciated.

Rgds,
Dan
LargeTestFile.zip
MediumTestFile.zip
XSLT.zip
HugeTestFIle.zip

Re: Performance issue when using “following-sibling“

Posted: Fri Mar 08, 2024 6:26 pm
by Martin Honnen
Can you tell us which XSLT processor you are using to run the tests? Is that Saxon 12? And if so, will it be the HE edition or can you also use the EE edition provided in oXygen?

I would certainly think that doing e.g.

Code: Select all

<xsl:template match="doc:Catalogue">
  <xsl:for-each-group select="cac:CatalogueLine" group-by="cbc:ID">
     <xsl:if test="current-group()[2]">
        <Error context="{name(parent::*)}/{name()}">
          <Pattern>current-group()[2]</Pattern>
           <Description>[F-CAT248] CatalogueLine.ID must be unique within the document instance</Description>
           <Duplicated-ID>{current-grouping-key()}</Duplicated-ID>
           <XPath>{path()}</XPath>
        </Error>
     </xsl:if>
  </xsl:for-each-group>
</xsl:template>
should perform way better than your code using following-sibling and preceding-sibling repeatedly.

If you can use EE I would also suggest to try for large data sets whether XSLT 3 streaming (capturing the ID in a map, for instance) is not way faster than doing traditional tree based processing navigating sibling axes.