[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Processing two documents, which order?


Subject: Re: [xsl] Processing two documents, which order?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 07 Apr 2011 15:25:55 +0100

On 07/04/2011 14:25, Dave Pawson wrote:

I have two xml documents. The first is a list of marked up words (1), the second a 'normal' xml document (2)

For each occurrence in 2 of a word from 1
I need to mark up the word with<property>  </property>

Which order is anywhere near optimum?
Document 1 has about 300 words,
Document 2 is 33,000 lines.
I'm having trouble seeing how this description of the problem relates to the code given below.

From first principles, if you do a nested loop then you're doing either 300*33000 operations or 33000*300 - its not a big difference either way. On the other hand if you use keys, then you are basically doing 300+33000 operations either way - but the key will be smaller if you build it on the smaller document, so that's what I would do.

Using regex matching with a dynamically computed regex looks like bad news - or is it really a regex in the source document? Saxon precompiles the regex if it's known statically, but if not there's no caching or anything - it gets compiled on each use. From this viewpoint, using each regex once (in a single analyze-string call) is going to be better.

Michael Kay
Saxonica
This is the template to do the work

<xsl:template match="*">
     <xsl:param name="property" as="xs:string"/>
     <xsl:analyze-string select="." regex="({$property})[\s\p{{P}}]">
       <xsl:matching-substring>
<!--	<xsl:message>match on [<xsl:value-of
select='regex-group(1)'/>]</xsl:message>  -->
<property><xsl:value-of
select="regex-group(1)"/></property>  </xsl:matching-substring>
       <xsl:non-matching-substring>
	<xsl:copy-of select="."/>
       </xsl:non-matching-substring>
     </xsl:analyze-string>
   </xsl:template>

but I'm hesitating as to which loop sequence will work best?


Current Thread
Keywords
xml