[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Support for lookaround regexp in XSLT -- any time soon?


Subject: Re: [xsl] Support for lookaround regexp in XSLT -- any time soon?
From: James Fuller <james.fuller.2007@xxxxxxxxx>
Date: Fri, 1 Mar 2013 18:24:46 +0100

thx for the background info, its useful and interesting to hear about.

btw

    http://en.wikipedia.org/wiki/Regular_expression

does a good job at identifying regex specs/docs

but I would argue that perl6

https://github.com/perl6/specs/blob/master/S05-regex.pod

does the best job at unambiguously defining ... though this regex is
not your grandad's regex.

back to regex in XML land ... to add a datapoint: I think the only oft
repeated shortcoming of regex in XML, is lack of lookahead/lookbehind

J



On Fri, Mar 1, 2013 at 11:40 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
>
>> unsure about the original reason to restrict regex, as it seems to
>
> just confuse people when a regex they lovingly crafted elsewhere
> doesn't work (not that the various java, Perl, etc schisms help).
>
> I don't know the history in full, but I think there were several reasons XSD
> adopted a "minimal" regex subset:
>
> (a) they wanted to be sure it could be widely implemented using existing
> regex engines (i.e. a highest common factor approach)
>
> (b) they wanted to exclude anything that didn't make sense in an
> international Unicode context (so things like word boundaries were
> immediately suspect)
>
> (c) they wanted to make sure that what they included was well specified.
> Finding solid specifications of regex constructs is remarkably difficult;
> there's a culture of very informal specification. Many times when adding
> constructs to the XPath spec, we've had to do empirical tests on existing
> regex engines such as PCRE to see how they actually handle edge cases, and
> very often we find differences between different engines that couldn't be
> guessed from the documentation. For example, there's a sorry history of
> patches to the spec regarding the handling of a newline character appearing
> as the last thing in the input. It's a shame when a feature gets left out
> because we can't decide what it should do in edge cases, but the standards
> process tends to lead to people asking such questions and expecting answers.
>
> Michael Kay
> Saxonica


Current Thread
Keywords