[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Michael Kay wrote:
I understand. Sounds reasonable and logical.
I cannot speak for .NET, but in the thread on the Saxon list I go deeper into this. It's not just about a spurious optimization feat or mismatch, it is about quite a common case.
In technical terms, you are right, however, not if you include the definition the astronomers have given about time: it ends some day. Jeffrey Friedl gives a calculation on page 227 of his 2nd ed book that a 50 char string with a bad regular expression, may run for a years. Here's the same with Java:
Meaning that if I apply such a regex to a corpus of over 85 characters, the process will stop long after the the whole universe has died out (current thesis: it will last about 30 billion years at most). Which is pretty long ;)
Calculation of this performance penalty: 2^(stringlen) * (time-one-char-match)
Java (Sun) is just a little behind with adding this optimization, because really, there is no need in testing any string more than string-len times, usually much less even.
-- Abel Braaksma
Re: [xsl] Backtracking and eternal loops caused by regular expressions matching: what to expect from implementations?
Subject: Re: [xsl] Backtracking and eternal loops caused by regular expressions matching: what to expect from implementations? From: Abel Braaksma <abel.online@xxxxxxxxx> Date: Wed, 24 Jan 2007 11:43:41 +0100 |
Michael Kay wrote:
(b) The specs have nothing to say about performance.
I understand. Sounds reasonable and logical.
(c) Saxon relies entirely on the regex engines in the underlying platform
(Java or .NET). I dare say there are cases where one regex engine will find
an optimization that another one misses.
I cannot speak for .NET, but in the thread on the Saxon list I go deeper into this. It's not just about a spurious optimization feat or mismatch, it is about quite a common case.
(d) I was under the impression that regular expression evaluation will
always terminate, though of course it's possible to construct cases where
that might require extreme patience to observe.
In technical terms, you are right, however, not if you include the definition the astronomers have given about time: it ends some day. Jeffrey Friedl gives a calculation on page 227 of his 2nd ed book that a 50 char string with a bad regular expression, may run for a years. Here's the same with Java:
25 char: 2.5 seconds 26 char: 5.3 seconds 27 char: 10.7 seconds 28 char: 21.4 seconds 35 char: 2739 seconds 40 char: 24.4 hours 50 char: almost 3 years 85 char: 97 * 10^9 years (97 billion years)
Meaning that if I apply such a regex to a corpus of over 85 characters, the process will stop long after the the whole universe has died out (current thesis: it will last about 30 billion years at most). Which is pretty long ;)
Calculation of this performance penalty: 2^(stringlen) * (time-one-char-match)
Java (Sun) is just a little behind with adding this optimization, because really, there is no need in testing any string more than string-len times, usually much less even.
-- Abel Braaksma
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Backtracking and eternal , Michael Kay | Thread | [xsl] XSLT 2.0 has arrived, Michael Kay |
Re: [xsl] XSLT 2.0 has arrived, Andrew Welch | Date | [xsl] Regex Expression Problem, Byomokesh |
Month |