POSIX elements of PCRE

Having trouble installing Oxygen? Got a bug to report? Post it all here.
catterall
Posts: 63
Joined: Sat Jan 24, 2004 12:10 am
Location: Oaxaca, Mexico
Contact:

POSIX elements of PCRE

Post by catterall »

Hello

I have been finding some strange faults in the implementation of the POSIX bits of PCRE regular expressions when doing Find Replace in Files applied to a set of 58 files.
The files contain refereneces to St.Paul, St. Peter, St. Anne, St. Aubyn, and St, George
(one space character, no trailing , of course)
with case sensitive and regex checked and no POSIX I find (correctly):
search for St. Paul found 8 occurrences (correct)
St. Peter 3
St. P 11
St. Anne 1
St. Aubyn 2
St. A 3
St. George 7
St. G 7

BUT if I use for example the Posix standard [[:upper:]] as implemented in standard PCRE
with 'case sensitive' and regex (both boxes checked)
St. ([[:upper:]]) finds nothing
st. ([[:upper:]]) finds nothing

with 'NOT case sensitive' and regex (only the regex box checked)
St. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
st. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)

I was under the impression you implemented PCRE including POSIX bits, but the POSIX bits don't seem to work very well.
Time to upgrade to full PCRE?

Ron
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: POSIX elements of PCRE

Post by adrian »

Hi,

Note that Oxygen uses Java regular expression syntax (Perl 5 based) which does not support the POSIX character classes syntax that you are using. See here the equivalent POSIX character classes syntax used by Java/Oxygen:
Java - Class Pattern - POSIX character classes (US-ASCII only)
In short, you need to use \p{Upper} instead of [:upper:] in your expression.

To clarify what's happening, your "[:upper:]" is treated as a plain character set, so the regexp engine looks for those individual characters, :,u,p,e,r.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
catterall
Posts: 63
Joined: Sat Jan 24, 2004 12:10 am
Location: Oaxaca, Mexico
Contact:

Re: POSIX elements of PCRE

Post by catterall »

Many thanks Adrian. That clears up many related problems. I've so used to PCRE.
Ron
Post Reply