[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Regular Expressions in XPath 2.0


Subject: RE: [xsl] Regular Expressions in XPath 2.0
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 23 Apr 2007 09:30:45 +0100

> I thought I'll share the above presentation as regular 
> expressions in XPath are based on Perl.

It's an interesting article. But of course the statement that "regular
expressions in XPath are based on Perl" has nothing to do with it: that's a
statement about the syntax, not about the implementation or choice of
algorithm.

The article is about different ways of implementing regular expressions. The
most interesting point it makes, I think, is that back-references turn
regular expressions into non-regular expressions, and that this requires the
use of backtracking implementation algorithms (which have worst-case
performance that is exponential). But of course the vast majority of regexes
do not use back-references, so any of the classical algorithms can be used.

There are some interesting trade-offs between the time taken to compile a
regular expression and the time taken to execute it. Determinizing an NFA
can be an expensive operation. This is rarely discussed in the theory, as
far as I can tell, though some of the papers do talk about incremental
determinization. You see this in schema processors (which use the regular
expression approach to validate an XML document against a grammar) - Saxon
creates a deterministic FSA for this, which has excellent run-time
performance, but in pathological cases creating the DFSA can be extremely
slow.

Michael Kay
http://www.saxonica.com/


Current Thread
Keywords