[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Optimization Question


Subject: RE: [xsl] Optimization Question
From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx>
Date: Mon, 31 Jan 2005 18:35:55 -0600

Hi Michael,

Interesting problem - I guess you'd need a professional profiling tool to do
it the correct way... :-)

Anyway, tell us first what your environment is, and what you do to cache the
repetitive operations, like loading (parsing and compiling) the source
documents involved. The 5MB document seems to be the application performance
killer.

Cheers,
<prs/>

-----Original Message-----
From: Michael Nguyen [mailto:mnguyen@xxxxxxxxxx] 
Sent: Monday, January 31, 2005 5:41 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] Optimization Question

All,
    I've been trying to find a more efficient way of transforming a large
group of files using Xalan.  I have about 435,000 xml documents (sizes
ranging from 600b to 8K) that I need to transform.  Each document undergoes
the same exact transformation.  Part of the transformation involves doing a
lookup in another xml document. ( about 5mb)  The source xml files are
stored in a hierarchical structure, such that the xml files are distributed
across 40 directories.  If I perform the transofrmations without the
external doc lookup, the entire process takes about two hours.  When I
perform the transformation with the lookup it runs roughly 8000 documents /
12 hours.  I'm running this on a
P4 3 Ghz system with 1GB ram. 
I'm using xsl:Key for lookup like the following:

<xsl:key name="all-groups-key" match="GROUP_DOC/COVER_SHEET/TITLE" 
use="ancestor::GROUP_DOC/@CRG" />
<xsl:variable name="all-groups" 
select="document('../current/groups.xml')//GROUP_DOC/COVER_SHEET/TITLE" />

... code ...snip
    <xsl:template match="CC" >
        <xsl:variable name="group_name" select ="." />
        <xsl:variable name="crg" select="substring-after(.,'-')" />

            <xsl:value-of select="$group_name"  />
        <xsl:variable name="group_full_name" 
select="$all-groups[generate-id()
= generate-id(key ('all-groups-key', $crg))]" />
        <xsl:choose>
            <xsl:when test="$group_full_name != ''" >
            <a href="../../../group/{$crg}.html"><xsl:value-of
select="$group_f
ull_name"  /></a>
            </xsl:when>
            <xsl:otherwise>
            <xsl:value-of select="$group_name"  />
            </xsl:otherwise>
        </xsl:choose>

        <xsl:if test="following-sibling::CC" ><br /></xsl:if>
    </xsl:template>

----------------------
The lookup is done for each CC tag there are in the document.  Each document
has at least one CC that matches.
It seems to me that the difference in processing time is solely due to the
lookup code above, because as soon as I remove it, all 435000 files are
processed in relatively no time.  Once I put the code back in however it
runs my machine to a grinding halt trying to process the files.  It seems to
be loading the groups.xml file each time I perform the transformation. 
What I want to try to do is store this stylesheet with the lookup xml in
memory to reduce the number of times the gorups.xml file is loaded. 

Thanks,
Michael Nguyen

--
-------------------------------------------------
Michael Nguyen
Senior Software Engineer, SKOLAR
Wolters Kluwer Health - Clinical Tools
1860 Embarcadero Rd, Suite 215
Palo Alto, CA 94303
Phone: 650-354-3025
mnguyen@xxxxxxxxxx


Current Thread
Keywords
xml