[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Special characters in regex expression


Subject: Re: [xsl] Special characters in regex expression
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 24 Jul 2014 04:46:53 -0000

On 23/07/2014, Michael Dykman mdykman@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> It is my understanding that Java' regular expression builtin emulates
> 'pcre' pretty closely.

Perl 5 has, over time, added some rather unique features that aren't
available with Java. XPath is a subset of Java's regex.

>
> To escape spacial characters that have special meaning in a regular
> expression, defining it as a character class (using the square bracket
> notation) generally works
>
> ie. if you want to match a question mark at the beginning of a line,
> use:  "^[?].*$"

Thus,  regex="(\.|\!|\?)(?!\)|\.|\d|\w)" (ignoring the lack of look-ahead)
were better rewritten as

     regex="[.!?](?![).\d\w])" <!-- not valid -->

It is possible to select groups within the matching substring:

     regex="([.!?])([^).\d\w])"

Thus, in this simple case it is possible to use regex-group(1) and
regex-group(2)
to get the two characters individually, and insert nodes as required.

I am not sure what Gabor expects to happen with, e.g., "...??..." or
"...!!...", which are matched by this regex.

-W

>
> On Wed, Jul 23, 2014 at 3:55 PM, mike@xxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Exclamation mark is not a special character in XPath regular expressions,
>> and there does not need to be (and must not be) escaped.
>>
>> Negative lookaheads are not supported in the XPath regular expression
>> dialect.
>>
>> You can't assume that all regular expression dialects are the same.
>>
>> Michael Kay
>>
>> Saxonica
>>
>>
>>
>>> Dear All,
>>>
>>> I am using xsl:analyze-string to retrieve and replace punctuation,
>>> however, I got the following error:
>>>
>>>  Error in regular expression: net.sf.saxon.trans.XPathException: Syntax
>>> error at char 6 in regular expression: Escape character '!' not allowed.
>>>
>>> How should I escape and match '?' and '!' ? I am also using a negative
>>> look-ahead, why isn't that working?
>>>
>>> Here is a sample from my code, thanks,
>>>
>>> Gabor
>>>
>>>
>>> <xsl:template match="//TEI:p//text()[ not
>>>         ((parent::TEI:note)|(parent::TEI:hi)|(parent::TEI:date))]">
>>>  <xsl:analyze-string select="." regex="(\.|\!|\?)(?!\)|\.|\d|\w)">
>>>
>>>             <xsl:matching-substring>
>>>
>>>                 <xsl:element name="seg"
>>> namespace="http://www.tei-c.org/ns/1.0"><xsl:value-of
>>> select="."/></xsl:element>
>>>            </xsl:matching-substring>
>>>             <xsl:non-matching-substring>
>>>                 <xsl:value-of select="."/>
>>>             </xsl:non-matching-substring>
>>>         </xsl:analyze-string>
>>>
>>
>> XSL-List info and archive
>> EasyUnsubscribe (by email)
>
>
>
> --
>  - michael dykman
>  - mdykman@xxxxxxxxx
>
>  May the Source be with you.


Current Thread
Keywords