POSIX elements of PCRE
Having trouble installing Oxygen? Got a bug to report? Post it all here.
			- 
				catterall
 - Posts: 63
 - Joined: Sat Jan 24, 2004 12:10 am
 - Location: Oaxaca, Mexico
 - Contact:
 
POSIX elements of PCRE
Hello
I have been finding some strange faults in the implementation of the POSIX bits of PCRE regular expressions when doing Find Replace in Files applied to a set of 58 files.
The files contain refereneces to St.Paul, St. Peter, St. Anne, St. Aubyn, and St, George
(one space character, no trailing , of course)
with case sensitive and regex checked and no POSIX I find (correctly):
search for St. Paul found 8 occurrences (correct)
St. Peter 3
St. P 11
St. Anne 1
St. Aubyn 2
St. A 3
St. George 7
St. G 7
BUT if I use for example the Posix standard [[:upper:]] as implemented in standard PCRE
with 'case sensitive' and regex (both boxes checked)
St. ([[:upper:]]) finds nothing
st. ([[:upper:]]) finds nothing
with 'NOT case sensitive' and regex (only the regex box checked)
St. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
st. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
I was under the impression you implemented PCRE including POSIX bits, but the POSIX bits don't seem to work very well.
Time to upgrade to full PCRE?
Ron
			
			
									
									
						I have been finding some strange faults in the implementation of the POSIX bits of PCRE regular expressions when doing Find Replace in Files applied to a set of 58 files.
The files contain refereneces to St.Paul, St. Peter, St. Anne, St. Aubyn, and St, George
(one space character, no trailing , of course)
with case sensitive and regex checked and no POSIX I find (correctly):
search for St. Paul found 8 occurrences (correct)
St. Peter 3
St. P 11
St. Anne 1
St. Aubyn 2
St. A 3
St. George 7
St. G 7
BUT if I use for example the Posix standard [[:upper:]] as implemented in standard PCRE
with 'case sensitive' and regex (both boxes checked)
St. ([[:upper:]]) finds nothing
st. ([[:upper:]]) finds nothing
with 'NOT case sensitive' and regex (only the regex box checked)
St. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
st. ([[:upper:]]) finds 11 (all Peter and Paul, but NO Anne, Aubyn or George)
I was under the impression you implemented PCRE including POSIX bits, but the POSIX bits don't seem to work very well.
Time to upgrade to full PCRE?
Ron
- 
				adrian
 - Posts: 2894
 - Joined: Tue May 17, 2005 4:01 pm
 
Re: POSIX elements of PCRE
Hi,
Note that Oxygen uses Java regular expression syntax (Perl 5 based) which does not support the POSIX character classes syntax that you are using. See here the equivalent POSIX character classes syntax used by Java/Oxygen:
Java - Class Pattern - POSIX character classes (US-ASCII only)
In short, you need to use \p{Upper} instead of [:upper:] in your expression.
To clarify what's happening, your "[:upper:]" is treated as a plain character set, so the regexp engine looks for those individual characters, :,u,p,e,r.
Regards,
Adrian
			
			
									
									Note that Oxygen uses Java regular expression syntax (Perl 5 based) which does not support the POSIX character classes syntax that you are using. See here the equivalent POSIX character classes syntax used by Java/Oxygen:
Java - Class Pattern - POSIX character classes (US-ASCII only)
In short, you need to use \p{Upper} instead of [:upper:] in your expression.
To clarify what's happening, your "[:upper:]" is treated as a plain character set, so the regexp engine looks for those individual characters, :,u,p,e,r.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
						<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
- 
				catterall
 - Posts: 63
 - Joined: Sat Jan 24, 2004 12:10 am
 - Location: Oaxaca, Mexico
 - Contact:
 
Re: POSIX elements of PCRE
Many thanks Adrian.  That clears up many related problems.  I've so used to PCRE.
Ron
			
			
									
									
						Ron
			
				Jump to
				
			
		
			
			
	
	- Oxygen XML Editor/Author/Developer
 - ↳ Feature Request
 - ↳ Common Problems
 - ↳ DITA (Editing and Publishing DITA Content)
 - ↳ Artificial Intelligence (AI Positron Assistant add-on)
 - ↳ SDK-API, Frameworks - Document Types
 - ↳ DocBook
 - ↳ TEI
 - ↳ XHTML
 - ↳ Other Issues
 - Oxygen XML Web Author
 - ↳ Feature Request
 - ↳ Common Problems
 - Oxygen Content Fusion
 - ↳ Feature Request
 - ↳ Common Problems
 - Oxygen JSON Editor
 - ↳ Feature Request
 - ↳ Common Problems
 - Oxygen PDF Chemistry
 - ↳ Feature Request
 - ↳ Common Problems
 - Oxygen Feedback
 - ↳ Feature Request
 - ↳ Common Problems
 - Oxygen XML WebHelp
 - ↳ Feature Request
 - ↳ Common Problems
 - XML
 - ↳ General XML Questions
 - ↳ XSLT and FOP
 - ↳ XML Schemas
 - ↳ XQuery
 - NVDL
 - ↳ General NVDL Issues
 - ↳ oNVDL Related Issues
 - XML Services Market
 - ↳ Offer a Service