[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
RE: [xsl] Stylesheet Optimization -- How to Make It Faster
Subject: RE: [xsl] Stylesheet Optimization -- How to Make It Faster From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 28 Nov 2006 09:14:04 -0000 |
(a) It would be a nice courtesy if you could lay out the code so that we can read it. (b) What XSLT processor are you using? (c) The most obvious inefficiency is here: expand="{$abbreviations[.=$abbr]/following-sibling::expanded}" This would benefit from use of keys. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx] > Sent: 28 November 2006 01:41 > To: Xsl-List > Subject: [xsl] Stylesheet Optimization -- How to Make It Faster > > I have a stylesheet that puts mark-up to text nodes that > matches an abbreviation in a reference xml file. Its working > nicely but the processing time is very slow... i'm guessing > because its processing text nodes. A 800kb file takes me > about 25 mins to process and i have around 800 file to > process (varying file sizes, some are relatively small and > some are fairly large). Is there any way to optimize my > stylesheet so that it can process the files faster? > > here is my stylesheet: > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet version="2.0" > xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:ati="http://www.asiatype.com/xslt-functions" > exclude-result-prefixes="xs ati"> > <xsl:output method="xml" version="1.0" encoding="UTF-8"/> > <xsl:variable name="abbreviations" as="element()+" > select="document('publishers_data.xml')/root/publisher/abbrev"/> > <xsl:template match="/"> > <xsl:apply-templates/> > </xsl:template> > <xsl:template match="text()[ancestor::ab and > not(ancestor::note[@id and @n and @lang])]"> <xsl:variable > name="str" as="xs:string" select="."/> <xsl:choose> <xsl:when > test="exists($abbreviations[matches($str,concat('(^|\W)(',ati: escape(.),')($|\W)'))])"> > <xsl:variable name="search-str" as="xs:string+" > select="$abbreviations[matches($str,concat('(^|\W)(',ati:escap e(.),')($|\W)'))]"/> > <xsl:variable name="replace" as="element()*"> <xsl:for-each > select="$search-str"> <xsl:variable name="abbr" > as="xs:string" select="."/> <abbr type="title" > expand="{$abbreviations[.=$abbr]/following-sibling::expanded}" ><xsl:value-of > select="$abbr"/></abbr> > </xsl:for-each> > </xsl:variable> > <xsl:sequence select="ati:replace-with-nodes($str, > $search-str, $replace)"/> </xsl:when> <xsl:otherwise> > <xsl:value-of select="$str"/> </xsl:otherwise> </xsl:choose> > </xsl:template> <xsl:template > match="@*|element()|comment()|processing-instruction()" > mode="#all"> > <xsl:copy> > <xsl:apply-templates select="@*|node()"/> </xsl:copy> > </xsl:template> <xsl:function name="ati:replace-with-nodes" > as="node()+"> <xsl:param name="input" as="xs:string"/> > <xsl:param name="words-to-replace" as="xs:string*"/> > <xsl:param name="replacement" as="node()*"/> <xsl:variable > name="regex" select="string-join(for $w in $words-to-replace > return concat('(', ati:escape($w), ')'),'|')"/> > <xsl:analyze-string select="$input" regex="{$regex}"> > <xsl:matching-substring> <xsl:variable name="i" > as="xs:integer" select="(1 to > count($words-to-replace))[regex-group(.)]"/> > <xsl:sequence select="$replacement[$i]"/> > </xsl:matching-substring> <xsl:non-matching-substring> > <xsl:value-of select="."/> </xsl:non-matching-substring> > </xsl:analyze-string> </xsl:function> <xsl:function > name="ati:escape"> <xsl:param name="s" as="xs:string"/> > <xsl:sequence > select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/> > </xsl:function> > </xsl:stylesheet> > > heres a short version of the publishers_data.xml: > > <root> > <publisher> > <abbrev>Inschriften von Priene</abbrev> > <expanded>Inschriften von Priene</expanded> </publisher> > <publisher> <abbrev>P. Mil. Congr. XVIII</abbrev> > <expanded>Papiri documentari dell'UniversitC Cattolica di > Milano</expanded> </publisher> <publisher> <abbrev>P. Jud. > Des. Misc.</abbrev> <expanded>Discoveries in the Judean > Desert XXXVIII</expanded> </publisher> > <!-- more publishers here --> > </root> > > heres a snippet of the source xml: > > <!-- preceding::node() of ab --> > <ab lang="grk" n="1"> > <foreign lang="grk">N N3N-N3N?N=N5 N:N1Oa=0 ON?a=:O NN1ON5a=7N?O</foreign> > <note place="margin">a c</note> <lb n="5"/> <foreign > lang="grk">OOa=9N=N?OO ON?a?& N<N5Oa=0 NN1N<N2a=;ON7N= N2N1ON9N;N5a=;ON1N=ON?O, a=ON5 N:N1a=6 > NN9N?N=a=;ON9N?O a<&N= a= NN9N;a=5ON9N?O</foreign> <lb/>(III), <foreign > lang="grk">a<Oa=6 Oa?O N>B/N5B/ a=N;ON<ON9a=1N4N?O</foreign> (520/16)<foreign > lang="grk">N a<1OON?ON9N?N3Oa=1ON?O. a?>NOa=9N4N?ON?O N4a=2 a= a?>NN;N9-</foreign> > <note place="margin">v</note> <lb/> <foreign > lang="grk">N:N1ON=N1ON5a=:O a= ON-N;N7ON1N9 ON?a=;ON?O, N=N5a==ON5ON?O a=$N=. N:N1a=6 a<&N= > a<N:N?OOOa=4O N OO ON1N3a=9ON?O</foreign> <note id="n7" n="7" lang="ger"> > <foreign lang="grk">a=$N=N N3N-N3N?N=N5 N3a=0O N<N5Oa>= N1a=Oa=9N=</foreign> > A</note> <lb/> <foreign lang="grk">a= a?>NN:N1ON1a?N?O. OOa?6ON?O N4a=2 > a<1OON?Oa=7N1N= ON5N6a?6O a<N>a=5N=N5N3N:N5, OON3N3ON1Oa=4N= N4a=2 N&N5ON5N:a=;N4N7O</foreign> > <note id="n8b9" n="8b9" lang="ger"> <foreign > lang="grk">OOa?6ON?ObN=N?N8N5a=;N5ON1N9</foreign> wiederholt s. <foreign > lang="grk">a=6OON?Oa?ON1N9</foreign>, s. <foreign > lang="grk">OON3N3ON1ON5a?O</foreign>.</note> > <lb/>(I 3). <foreign lang="grk">Oa=0 N3a=0O a>=NN:N?OON9N;a=1N?O</foreign> > (<link type="boj" targets="a002" n="BOJTEXT002_T_7">2 T > 7</link>) <foreign lang="grk">N=N?N8N5a=;N5ON1N9.</foreign> <note > id="n9" n="9" lang="ger"> <foreign > lang="grk">a>=NN:N?OON9N;a=1N?O</foreign> Vossius <foreign > lang="grk">a>=NN3N7ON9N;a=1N?O</foreign> Suid</note> </ab> > <!-- following::node() of ab --> > > all: ab nodes appear in the same level (same depth) though out. > > Any suggestions are welcome. > > Thanks, > -- > Jeff
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Stylesheet Optimization -- Ho, Jeff Sese | Thread | Re: [xsl] Stylesheet Optimization -, Jeff Sese |
[xsl] Stylesheet Optimization -- Ho, Jeff Sese | Date | Re: [xsl] Stylesheet Optimization -, Jeff Sese |
Month |