[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Negative lookahead/behind in XSL regexp


Subject: Re: [xsl] Negative lookahead/behind in XSL regexp
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Tue, 10 May 2011 08:56:01 +0200

A clumsy attempt to achieve the recognition of "real" keywords.
Improvements are certainly possible, e.g., putting it all into a
single function. The basic idea is to split the text according to the
elementary keyword pattern and then to look at the start or end of the
following or previous chunk, respectively; using the patterns for the
look*-assertions.

<?xml version="1.0"?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                 xmlns:wl="w/l"
                 version="2.0">

  <xsl:output indent="yes"/>

  <xsl:function name="wl:keywords" as="xsd:string*">
    <xsl:param name="string" as="xsd:string"/>
    <xsl:param name="kwd"    as="xsd:string"/>
    <xsl:analyze-string regex="{$kwd}" select="$string">
      <xsl:matching-substring>
        <xsl:value-of select="."/>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:function>

  <xsl:function name="wl:chunks" as="xsd:string*">
    <xsl:param name="string" as="xsd:string"/>
    <xsl:param name="kwd"    as="xsd:string"/>

    <xsl:analyze-string regex="{$kwd}" select="$string">
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:function>

  <xsl:template match="/">
    <xsl:apply-templates select="/*"/>
  </xsl:template>

  <xsl:template match="text">
   <xsl:variable name="keywords" as="xsd:string*"
                 select="wl:keywords(.,'[pqr]')"/>
   <xsl:variable name="chunks"   as="xsd:string*"
                 select="wl:chunks(.,'[pqr]')"/>

   <wl:res>
    <xsl:for-each select="$keywords">
      <xsl:variable name="i" select="position()"/>
      <xsl:variable name="before" as="xsd:string*"
                    select="$chunks[$i]"/>
      <xsl:variable name="after" as="xsd:string*"
                    select="$chunks[$i + 1]"/>
      <xsl:choose>
        <xsl:when test="matches($before,'[ab]$')">
        </xsl:when>
        <xsl:when test="matches($after,'^[xyz]')">
        </xsl:when>
        <xsl:otherwise>
          <wl:real-keyword>
            <xsl:value-of select="$keywords[$i]"/>
          </wl:real-keyword>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>

   </wl:res>
  </xsl:template>
</xsl:stylesheet>

When applied to

  <table>
    <text>p aq pz is q it p</text>
  </table>

the result is

 <wl:res xmlns:wl="w/l" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <wl:real-keyword>p</wl:real-keyword>
   <wl:real-keyword>q</wl:real-keyword>
   <wl:real-keyword>p</wl:real-keyword>
</wl:res>

-W


On 9 May 2011 10:41, Andrew Welch <andrew.j.welch@xxxxxxxxx> wrote:
>
> On 9 May 2011 09:32, Clint Redwood <clint@xxxxxxxxxxxxxxx> wrote:
> > Hi,
> >
> > I'm using regexp in XSL to process input text data into a selection of
fixed values. This works for most of my requirements but I have one key word
which I need to exclude if preceded by one word or followed by another. It
appears that lookahead and lookbehind are not included in the XSL2.0
specification, so I'm looking for an alternative way of doing it.
> >
> > The only way I've found so far involves lots of nested groups with negated
character classes and is very messy with long words.
> >
> > Anyone got a neat way of doing it?
>
> Hard to say without sample inputs and outputs... could you add another
> step after the regex processing and exclude the key word there?
>
>
> --
> Andrew Welch
> http://andrewjwelch.com


Current Thread
Keywords
xsl