[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Optimization Question


Subject: Re: [xsl] Optimization Question
From: Tony Lavinio <xml1@xxxxxxxxxxx>
Date: Mon, 31 Jan 2005 20:42:30 -0500

There are a couple of XSLT profilers out there.  Stylus Studio, for
example, has one that hooks into the internal processor or Saxon or
Xalan-J.  It would tell you where the time is being spent.

See http://www.stylusstudio.com/xslt_profiler.html

The web page is a little out of date; the current version also added
Saxon 8 profiling, and the output includes cute little graphs.

In the profilers I've seen, including this one, the output format
itself is XML, letting you do some interesting transforms with the
profiling data.



On 01-31-2005 6:41 PM, Michael Nguyen wrote:

All,
I've been trying to find a more efficient way of transforming a large group of files using Xalan. I have about 435,000 xml documents (sizes ranging from 600b to 8K) that I need to transform. Each document undergoes the same exact transformation. Part of the transformation involves doing a lookup in another xml document. ( about 5mb) The source xml files are stored in a hierarchical structure, such that the xml files are distributed across 40 directories. If I perform the transofrmations without the external doc lookup, the entire process takes about two hours. When I perform the transformation with the lookup it runs roughly 8000 documents / 12 hours. I'm running this on a P4 3 Ghz system with 1GB ram. I'm using xsl:Key for lookup like the following:


<xsl:key name="all-groups-key" match="GROUP_DOC/COVER_SHEET/TITLE" use="ancestor::GROUP_DOC/@CRG" />
<xsl:variable name="all-groups" select="document('../current/groups.xml')//GROUP_DOC/COVER_SHEET/TITLE" />


... code ...snip
   <xsl:template match="CC" >
       <xsl:variable name="group_name" select ="." />
       <xsl:variable name="crg" select="substring-after(.,'-')" />

<xsl:value-of select="$group_name" />
<xsl:variable name="group_full_name" select="$all-groups[generate-id()
= generate-id(key ('all-groups-key', $crg))]" />
<xsl:choose>
<xsl:when test="$group_full_name != ''" >
<a href="../../../group/{$crg}.html"><xsl:value-of select="$group_f
ull_name" /></a>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$group_name" />
</xsl:otherwise>
</xsl:choose>


       <xsl:if test="following-sibling::CC" ><br /></xsl:if>
   </xsl:template>

----------------------
The lookup is done for each CC tag there are in the document. Each document has at least one CC that matches.
It seems to me that the difference in processing time is solely due to the lookup code above, because as soon as I remove it, all 435000 files are processed in relatively no time. Once I put the code back in however it runs my machine to a grinding halt trying to process the files. It seems to be loading the groups.xml file each time I perform the transformation. What I want to try to do is store this stylesheet with the lookup xml in memory to reduce the number of times the gorups.xml file is loaded.
Thanks,
Michael Nguyen


Current Thread
Keywords