[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] XSL performance question: running count of attributes using axes and sum()


Subject: Re: [xsl] XSL performance question: running count of attributes using axes and sum()
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 09 Apr 2009 15:09:48 -0400

Hi Mark,

I think your solution is in multiple passes. Preprocess your data to make your values explicit for the presentation phase.

There are a number of ways you could go about it, but I'd consider something like this:

1. annotate words with number of syllables in each (i.e., make word/@length explicit)
1b. optionally, do the same with lines
2. Use a sibling-recursion approach to calculate offsets at whatever level(s) (syllable, word and/or line) you like
3. Then work from the offsets instead of the brute-force calculations


Sibling recursion works like this:

<xsl:template match="x" mode="add-offsets"/>
  <xsl:param name="offset" select="0"/>
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:attribute name="offset">
      <xsl:value-of select="$offset"/>
    </xsl:attribute>
  </xsl:copy>
  <xsl:apply-templates select="following-sibling::x[1]" mode="add-offsets">
    <xsl:with-param name="offset" select="$offset + @length"/>
  </xsl:apply-templates>
</xsl:template>

You would kick this off by applying templates to the x[1] (only) of any sequence of x siblings. As you can see, it goes forward among x element siblings until there aren't any left. Essentially, the technique is to force a forward traversal of the document, which allows passing parameters along. Ordinarily one doesn't want to do this since it prevents the processor from optimizing its traversal -- but if you need to do some kinds of intensive calculations in document-wide scope (as here), it can be worth it.

Preprocessing to calculate lengths of words and lines would enable you to get around how your syllables are not all siblings, thereby allowing calculation of total offsets instead of just offsets relative to their containers. Another possibility for dealing with this would be to use the following:: axis not the following-sibling:: axis, but (depending on the processor) you might not see the same speed gains there.

On the other hand, depending on the size of the data set, you might find that simply preprocessing to calculate lengths at the word and line level, and not doing the calculation of offsets, helps enough by itself.

If you could use XSLT 2.0, you'd have more options and techniques at your disposal.

Also, some processors have extensions that are useful for this sort of thing.

floor() is an XSLT 1.0 function, and a conformant processor will respect it.

Cheers,
Wendell

Getting quite comfortable using XSL. Since I am using alot more heavy-duty XSL, I am now hitting barriers with performance. My quesiton to the forum is for once, not a beginner's question!

In transforming the <syl> tags below into HTML table cells to display them, I need to format each cell with a green color with the running total of the @length attributes is a multiple of four. Ideally having the ability to do running totals in another variable would be great, but not the best XSL-esque solution, so I am using axes instead. I have tried solutions with count and sum, but performance is slow: 756 lines like the ones below mean thousands of syllables to check, each with its own axis computation -- the complete xform takes more than an hour!

Can anyone point me to a solution that is more performant yet still elegant/simple?

An aside: it seems that ceiling() is an Xpath1.0 function, but oddly enough not floor() -- Altova SPY complains about floor until I change the stylesheet to version 2.0 (sigh). I would love this to transform in XSL1.0 if possible, and rounding down each length to the integer is essential to acheive the correct formatting result.

Thanks in advance for any help on this.

XML:

<poem>
        <line id="1">
                <word id="1">
                        <syl length="2">Ar</syl>
                        <syl length="1">ma</syl>
                </word>
                <word id="2">
                        <syl length="1">vi</syl>
                        <syl length="2">rum</syl>
                </word>
                <syl length="1">que</syl>
                <word id="3">
                        <syl length="1">ca</syl>
                        <syl length="2">no</syl>
                </word> ,
                <word id="4">
                        <syl length="2">Tro</syl>
                        <syl length="2">iae</syl>
                </word>
                <word id="5">
                        <syl length="2">qui</syl>
                </word>
                <word id="6">
                        <syl length="2">pri</syl>
                        <syl length="1">mus</syl>
                </word>
                <word id="7">
                        <syl length="1">ab</syl>
                </word>
                <word id="8">
                        <syl length="2">o</syl>
                        <syl length="2">ris</syl>
                </word>
        </line>
        <line id="2">
                <word>
                        <syl length="2">li</syl>
                        <syl length="1.5">to</syl>
                        <syl length="1">ra</syl>
                </word> ,
                <word id="15">
                        <syl length="2">mul</syl>
                        <syl elide="true" length="1">tum</syl>
                </word>
                <word id="16">
                        <syl length="2">il</syl>
                        <syl elide="true" length="1">le</syl>
                </word>
                <word id="17">
                        <syl length="2">et</syl>
                </word>
                <word id="18">
                        <syl length="2">ter</syl>
                        <syl length="2">ris</syl>
                </word>
                <word id="19">
                        <syl length="2">iac</syl>
                        <syl length="2">ta</syl>
                        <syl length="1">tus</syl>
                </word>
                <word id="20">
                        <syl length="1">et</syl>
                </word>
                <word id="21">
                        <syl length="2">al</syl>
                        <syl length="2">to</syl>
                </word>
        </line>
</poem>


XSL template:


<xsl:template match="syl">

<xsl:variable name="line_id"><xsl:value-of select="node()/ancestor::line/@id" /></xsl:variable>

<xsl:variable name="current_quantity"><xsl:value-of select="sum(preceding::syl[ancestor::line/@id = $line_id and (not(@elide) or @elide='false') ]/floor(@length))" /></xsl:variable>

<xsl:variable name="color"><xsl:choose><xsl:when test="@length=2 and ($current_quantity mod 4 = 0)">background-color:#EEFFEE;</xsl:when></xsl:choose></xsl:variable>

<td style="{$color}"><xsl:value-of select="text()" /></td>

</xsl:template>


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


Current Thread
Keywords