[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] analyze-string gotcha/reminder

Subject: Re: [xsl] analyze-string gotcha/reminder
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Mon, 19 Nov 2012 09:12:33 +0000

I feel your pain. Many of us have lost a few hairs over this one. The good news is that you probably won't make the same mistake again, or if you do, you will spot it far more quickly.

It's a case where even in retrospect, it's hard to see how we could have avoided this problem in the language design. Perhaps two separate attributes, regex and regex-avt. But that feels very heavy-handed. Most languages have a few quirks like this where people just have to learn the hard way.

Michael Kay

On 18/11/2012 18:18, Ihe Onwuka wrote:
Below is a multiple match meant to extract 4 digit numbers from text

	         <xsl:analyze-string select="$line" regex="(\D|^)(\d{4})(\D|$)">

It doesn't work. I tried exactly the same regex in XQuery using replace

xquery version "1.0";
replace('Accounting Items                                Dec.31,2005
  Dec.31,2006    Dec.31,2007

it worked and I got

Accounting Items                                Dec.31xxxx
Dec.31xxxx   Dec.31xxxx   Dec.31xxxx

I thought maybe there was special syntax for the multiple match case - but no.
Eventually I turned to the specification and found this.

Because the regex attribute is an attribute value template, curly
brackets within the regular expression must be doubled. For example,
to match a sequence of one to five characters, write regex=".{{1,5}}".
For regular expressions containing many curly brackets it may be more
convenient to use a notation such as
regex="{'[0-9]{1,5}[a-z]{3}[0-9]{1,2}'}", or to use a variable.

So I had to double up my curly braces.

There's an hour of my life that I won't get back.

Current Thread