Performance issue when using “following-sibling“

Here should go questions about transforming XML with XSLT and FOP.
DanOvergaard
Posts: 22
Joined: Thu Jan 07, 2021 10:44 am

Performance issue when using “following-sibling“

Post by DanOvergaard »

Hi,

I am facing a performance issue with a XSLT test using “following-sibling“ and I would like to know if anybody has a suggestion to optimize the test.

The problem is that the test-time increase exponential with the number of lines and some of the XML-files have 80.000 or more lines.

I have attached XSLT fil with a test using “following-sibling“ (Original test) and also a “for-each-group” test to see the performance, but it’s as bad as using “following-sibling“.

The test is used to make sure that each line identifier is unique (cbc:ID)

Any comments and suggestion are appreciated.

Rgds,
Dan
LargeTestFile.zip
(388.08 KiB) Downloaded 18 times
MediumTestFile.zip
(152.39 KiB) Downloaded 19 times
XSLT.zip
(871 Bytes) Downloaded 17 times
HugeTestFIle.zip
(760 KiB) Downloaded 21 times
Martin Honnen
Posts: 97
Joined: Tue Aug 19, 2014 12:04 pm

Re: Performance issue when using “following-sibling“

Post by Martin Honnen »

Can you tell us which XSLT processor you are using to run the tests? Is that Saxon 12? And if so, will it be the HE edition or can you also use the EE edition provided in oXygen?

I would certainly think that doing e.g.

Code: Select all

<xsl:template match="doc:Catalogue">
  <xsl:for-each-group select="cac:CatalogueLine" group-by="cbc:ID">
     <xsl:if test="current-group()[2]">
        <Error context="{name(parent::*)}/{name()}">
          <Pattern>current-group()[2]</Pattern>
           <Description>[F-CAT248] CatalogueLine.ID must be unique within the document instance</Description>
           <Duplicated-ID>{current-grouping-key()}</Duplicated-ID>
           <XPath>{path()}</XPath>
        </Error>
     </xsl:if>
  </xsl:for-each-group>
</xsl:template>
should perform way better than your code using following-sibling and preceding-sibling repeatedly.

If you can use EE I would also suggest to try for large data sets whether XSLT 3 streaming (capturing the ID in a map, for instance) is not way faster than doing traditional tree based processing navigating sibling axes.
Post Reply