[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] XML text search & replace


Subject: Re: [xsl] XML text search & replace
From: Martin Honnen <Martin.Honnen@xxxxxx>
Date: Wed, 23 Mar 2011 15:10:19 +0100

a kusa wrote:
Hello

There is a requirement to search for a particular pattern in XML
documents and replace them by reading another XML file and copying
over the replacement text correcpinding to the original text.  I have
been trying to use<xsl:analyze-string>  in xslt 2.0. but I am not sure
how to read another XML file using this tag.

As an example, if I have some text tagged within<para> tags :

<para> this is a simple text</para>

I have an external xml file of the form:

<matchtext>simple</matchtext>
<replacetext>hard</replacetext>

In my<xsl:matching-substring>, can I use doc() to read the external
XML file and replace the text?

Yes, simply build a regular expression and use that. Here is a sample:


<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

<xsl:param name="rep-file" as="xsd:string" select="'test2011032302.xml'"/>
<xsl:variable name="rep-doc" as="document-node()" select="doc($rep-file)"/>
<xsl:variable name="rep-pattern" as="xsd:string"
select="string-join($rep-doc/replacements/replacement/matchtext, '|')"/>


<xsl:key name="rep-key" match="replacement" use="matchtext"/>

  <xsl:template match="para">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="para//text()">
    <xsl:analyze-string select="." regex="{$rep-pattern}">
      <xsl:matching-substring>
        <xsl:value-of select="key('rep-key', ., $rep-doc)/replacetext"/>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

Assumes you have a file test2011032302.xml

<replacements>
  <replacement>
    <matchtext>simple</matchtext>
    <replacetext>hard</replacetext>
  </replacement>
</replacements>


There are some shortcomings, namely that word boundaries like \b are not supported by the XSLT/XPath regular expression language so it is difficult to prevent that e.g. "simple" in "simpleminds" is not replaced. If your XSLT 2.0 processor is AltovaXML Tools then I think it supports \b however.
Another problem occurs if the matchtext contains characters that are meta character in regular expressions like '?' or ')', you would first need to escape them with a function like http://www.xsltfunctions.com/xsl/functx_escape-for-regex.html.



--


	Martin Honnen
	http://msmvps.com/blogs/martin_honnen/


Current Thread
Keywords