[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
On 07/04/2011 14:25, Dave Pawson wrote:
From first principles, if you do a nested loop then you're doing either 300*33000 operations or 33000*300 - its not a big difference either way. On the other hand if you use keys, then you are basically doing 300+33000 operations either way - but the key will be smaller if you build it on the smaller document, so that's what I would do.
Using regex matching with a dynamically computed regex looks like bad news - or is it really a regex in the source document? Saxon precompiles the regex if it's known statically, but if not there's no caching or anything - it gets compiled on each use. From this viewpoint, using each regex once (in a single analyze-string call) is going to be better.
Re: [xsl] Processing two documents, which order?
Subject: Re: [xsl] Processing two documents, which order? From: Michael Kay <mike@xxxxxxxxxxxx> Date: Thu, 07 Apr 2011 15:25:55 +0100 |
On 07/04/2011 14:25, Dave Pawson wrote:
I'm having trouble seeing how this description of the problem relates to the code given below.
I have two xml documents. The first is a list of marked up words (1), the second a 'normal' xml document (2)
For each occurrence in 2 of a word from 1 I need to mark up the word with<property> </property>
Which order is anywhere near optimum? Document 1 has about 300 words, Document 2 is 33,000 lines.
From first principles, if you do a nested loop then you're doing either 300*33000 operations or 33000*300 - its not a big difference either way. On the other hand if you use keys, then you are basically doing 300+33000 operations either way - but the key will be smaller if you build it on the smaller document, so that's what I would do.
Using regex matching with a dynamically computed regex looks like bad news - or is it really a regex in the source document? Saxon precompiles the regex if it's known statically, but if not there's no caching or anything - it gets compiled on each use. From this viewpoint, using each regex once (in a single analyze-string call) is going to be better.
Michael Kay Saxonica
This is the template to do the work
<xsl:template match="*"> <xsl:param name="property" as="xs:string"/> <xsl:analyze-string select="." regex="({$property})[\s\p{{P}}]"> <xsl:matching-substring> <!-- <xsl:message>match on [<xsl:value-of select='regex-group(1)'/>]</xsl:message> --> <property><xsl:value-of select="regex-group(1)"/></property> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:copy-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template>
but I'm hesitating as to which loop sequence will work best?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Processing two documents, whi, Dave Pawson | Thread | Re: [xsl] Processing two documents,, Dave Pawson |
[xsl] Processing two documents, whi, Dave Pawson | Date | Re: [xsl] Processing two documents,, Brandon Ibach |
Month |
Keywords