[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Performance of predicate-based patterns

Subject: Re: [xsl] Performance of predicate-based patterns
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 23 Jan 2015 19:52:46 -0000

I don't think anyone at all familiar with normal DITA XSLT practice would
use anything other than [contains(@class, ' foo/bar ')] or the DITA
Community df:class() function:

<xsl:function name="df:class" as="xs:boolean">
    <xsl:param name="elem" as="element()"/>
    <xsl:param name="classSpec" as="xs:string"/>

      <!-- '\$" in the regex is a workaround for a bug in MarkLogic 3.x
and for a common user
         error, where trailing space in class= attribute is dropped.
    <xsl:variable name="normalizedClassSpec" as="xs:string"
    <xsl:variable name="result"
                       concat(' ', $normalizedClassSpec, ' | ',
$normalizedClassSpec, '$'))"

    <xsl:sequence select="$result"/>

The df:class() function handles the case where a @class attribute value is
missing the required trailing space in the @class value (a problem that
MarkLogic used to cause but that was fixed in ML 4 I think).

If there's a more efficient way to match values in the @class attribute,
I'd certainly like to know about it.


Eliot Kimber, Owner
Contrext, LLC

On 1/23/15, 8:19 AM, "Graydon graydon@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>On Fri, Jan 23, 2015 at 11:28:31AM -0000, Michael Kay mike@xxxxxxxxxxxx
>> We've started doing some performance work in Saxon on the DITA
>> stylesheets, which use large numbers of match patterns in the form
>> <xsl:template match="*[contains(@class, ' token ')]">
>If anybody ever starts using XSLT 2.0 for DITA processing, there are
>going to be things like
><xsl:template match="*[(tokenize(@class,'\p{Zs}+')[normalize-space()])[2]
>eq 'topic/li']]">
>showing up.  ("some $x in tokenize(@class,...."  seems pretty likely,
>> Currently these require a very inefficient sequential search to find
>> the matching rule for each element.
>> Does anyone know of any other commonly-used stylesheets (or even,
>> uncommonly used ones) which show similar characteristics, that is,
>> large numbers of match patterns using predicate matching only, with no
>> explicit element names? We'd like any optimizations we implement to be
>> as general-purpose as possible.
>I've done some conversion work on legal documents where the goal was to
>get everything back on a single schema after a couple decades of
>evolution in the element names of various DTDs.  Matches of the form
><xsl:template match="*[name() = ('P','NP','PARA')]">
>showed up a fair bit to match on the abstract "that's a paragraph"
>across the range of evolved element names.
>There was also a fair bit of
><xsl:template match="*[not(name() = ('PARA','LIST','TABLE')))]">
>used as general "we don't think there's anything but those in the data
>but let's not make rash assumptions" surprise handler templates.
>-- Graydon

Current Thread