[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Re: backticks in regex - tales of the unexpected part II


Subject: Re: [xsl] Re: backticks in regex - tales of the unexpected part II
From: "Abel Braaksma (Exselt)" <abel@xxxxxxxxxx>
Date: Mon, 07 Apr 2014 18:58:33 +0200

On 7-4-2014 18:35, Ihe Onwuka wrote:
> Just going by the definition of the \w class in MK's XPath 2.0
> reference - \w -> a character considered to form part of a word

Essentially, it is the Unicode standard that defines whether something
is outside of \p{C}, \p{Z} or \p{P}. And I would find it rather strange
is "accent grave" would _not_ be considered a possible part of a word,
similarly to diaeresis, breve, cedilla etc. The counterpart, the acute
accent, is categorized the same. But not apostrophe, which is often
considered an acute accent, but really isn't.

I understand the confusion: consider the math and currency symbols, from
the same XSLT book you are quoting, it tells you that they are part of
it as well. How is $, + or > a word character? I don't know. I guess the
Unicode consortium just had to draw the line somewhere.

> So it's TS if backtick isn't a word character in your vocabulary.
> Probably neither the first or the last to get caught by that one.

Not sure what TS means. But I'm sure you are not the last to get caught
by that one. Personally, I hardly ever use \w because I find it very
hard to understand what it does and does not match. The following is
word? Tell`>me$45).

I find it easiest to define the subranges myself, or use the
\p{Category} syntax, which I find clearer.

Cheers,
Abel


Current Thread
Keywords