POSIX elements of PCRE
Having trouble installing Oxygen? Got a bug to report? Post it all here.
POSIX elements of PCRE
Hello
I have been finding some strange faults in the implementation of the POSIX bits of PCRE regular expressions when doing Find Replace in Files applied to a set of 58 files.
The files contain refereneces to St.Paul, St. Peter, St. Anne, St. Aubyn, and St, George
(one space character, no trailing , of course)
with case sensitive and regex checked and no POSIX I find (correctly):
search for St. Paul found 8 occurrences (correct)
St. Peter 3
St. P 11
St. Anne 1
St. Aubyn 2
St. A 3
St. George 7
St. G 7
BUT if I use for example the Posix standard [[:upper:]] as implemented in standard PCRE
with 'case sensitive' and regex (both boxes checked)
St. ([[:upper:]]) finds nothing
st. ([[:upper:]]) finds nothing
with 'NOT case sensitive' and regex (only the regex box checked)
St. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
st. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
I was under the impression you implemented PCRE including POSIX bits, but the POSIX bits don't seem to work very well.
Time to upgrade to full PCRE?
Ron
I have been finding some strange faults in the implementation of the POSIX bits of PCRE regular expressions when doing Find Replace in Files applied to a set of 58 files.
The files contain refereneces to St.Paul, St. Peter, St. Anne, St. Aubyn, and St, George
(one space character, no trailing , of course)
with case sensitive and regex checked and no POSIX I find (correctly):
search for St. Paul found 8 occurrences (correct)
St. Peter 3
St. P 11
St. Anne 1
St. Aubyn 2
St. A 3
St. George 7
St. G 7
BUT if I use for example the Posix standard [[:upper:]] as implemented in standard PCRE
with 'case sensitive' and regex (both boxes checked)
St. ([[:upper:]]) finds nothing
st. ([[:upper:]]) finds nothing
with 'NOT case sensitive' and regex (only the regex box checked)
St. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
st. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
I was under the impression you implemented PCRE including POSIX bits, but the POSIX bits don't seem to work very well.
Time to upgrade to full PCRE?
Ron
Re: POSIX elements of PCRE
Hi,
Note that Oxygen uses Java regular expression syntax (Perl 5 based) which does not support the POSIX character classes syntax that you are using. See here the equivalent POSIX character classes syntax used by Java/Oxygen:
Java - Class Pattern - POSIX character classes (US-ASCII only)
In short, you need to use \p{Upper} instead of [:upper:] in your expression.
To clarify what's happening, your "[:upper:]" is treated as a plain character set, so the regexp engine looks for those individual characters, :,u,p,e,r.
Regards,
Adrian
Note that Oxygen uses Java regular expression syntax (Perl 5 based) which does not support the POSIX character classes syntax that you are using. See here the equivalent POSIX character classes syntax used by Java/Oxygen:
Java - Class Pattern - POSIX character classes (US-ASCII only)
In short, you need to use \p{Upper} instead of [:upper:] in your expression.
To clarify what's happening, your "[:upper:]" is treated as a plain character set, so the regexp engine looks for those individual characters, :,u,p,e,r.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Re: POSIX elements of PCRE
Many thanks Adrian. That clears up many related problems. I've so used to PCRE.
Ron
Ron
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service