[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
The way that fn:matches usually works is that it attempts to apply the regular expression starting from each character in the string.
So, looking at the regular expression ".*?Expr" where Expr is another regular expression:
.*? is a reluctant match and so we try to match the shortest expression possible first.
Starting from the first character:
Try match the empty string for .*? and then try to mach Expr starting from the first character.
Try to match a single character for .*? and then try to match Expr starting from the second character.
Try to match two characters for .*? and then try to match Expr starting from the third character.
...
Thus in trying to find a match for ".*?Expr" starting at a particular character we try to match Expr starting at every other caharcter in the string, until a match is found.
fn:matches attempts to find a match for the regular expression starting from each character in the string. If a match isnt found starting at the first character then it will try again starting at the second and so on. So if there are no matches then Expr is tested n^2 times where n is the number of characters in the string, wheras matches applied to Expr would just test it n times.
Now, if your query was (note the '^'):
matches($title,'^.*?(\.|,)\s*$')
then I would expect perfomance to be very similar to:
matches($title,'(\.|,)\s*$')
It would be perfectly valid (and sensible) for a query processor to realise that the two expressions you gave were equivalent and so not perform n^2 tests, but I am unaware of a processor that makes these kinds of optimizations to regular expressions.
On 12/07/2011 17:26, Alex Muir wrote:
Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$'))
Subject: Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')) From: Oliver Hallam <oliver@xxxxxxxxxxx> Date: Wed, 13 Jul 2011 13:33:53 +0100 |
The way that fn:matches usually works is that it attempts to apply the regular expression starting from each character in the string.
So, looking at the regular expression ".*?Expr" where Expr is another regular expression:
.*? is a reluctant match and so we try to match the shortest expression possible first.
Starting from the first character:
Try match the empty string for .*? and then try to mach Expr starting from the first character.
Try to match a single character for .*? and then try to match Expr starting from the second character.
Try to match two characters for .*? and then try to match Expr starting from the third character.
...
Thus in trying to find a match for ".*?Expr" starting at a particular character we try to match Expr starting at every other caharcter in the string, until a match is found.
fn:matches attempts to find a match for the regular expression starting from each character in the string. If a match isnt found starting at the first character then it will try again starting at the second and so on. So if there are no matches then Expr is tested n^2 times where n is the number of characters in the string, wheras matches applied to Expr would just test it n times.
Now, if your query was (note the '^'):
matches($title,'^.*?(\.|,)\s*$')
then I would expect perfomance to be very similar to:
matches($title,'(\.|,)\s*$')
It would be perfectly valid (and sensible) for a query processor to realise that the two expressions you gave were equivalent and so not perform n^2 tests, but I am unaware of a processor that makes these kinds of optimizations to regular expressions.
Oliver Hallam XQSharp
On 12/07/2011 17:26, Alex Muir wrote:
Hi,
I'm wondering why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$'))
I found at least in one file out of thousands that I process the first one can take a good 30 minutes to complete and the second is quick.
No doubt the .*? causes some problem but what exactly is the problem that is causes?
Regards
-- Alex Muir Instructor | Program Organizer - University Technology Student Work Experience Building University of the Gambia http://sites.utg.edu.gm/alex/
Low budget software development benefiting development in the Gambia, West Africa Experience of a lifetime, come to Gambia and Join UTSWEB - http://sites.utg.edu.gm/utsweb/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] why matches($title,'.*?(\.|,), Alex Muir | Thread | Re: [xsl] why matches($title,'.*?(\, Alex Muir |
[xsl] why matches($title,'.*?(\.|,), Alex Muir | Date | Re: [xsl] why matches($title,'.*?(\, Alex Muir |
Month |