Regular expressions discrepancy
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 93
- Joined: Fri Mar 08, 2013 8:58 am
Regular expressions discrepancy
Suppose a string, "|". When I search that string for a match against \W, there is no match in the context of operation using an XPath expression (say a stylesheet). But there is a match when I search in the oXygen Find/Replace dialog box (i.e. Find: \W).
As far as I can tell from the official definition, http://www.w3.org/TR/xmlschema-2/#charcter-classes, the XPath is right, since U+00C7 is tagged as Sm, which is not excluded from the class \w.
I would think that the oXygen search mechanism is wrong, or else the departure from the W3C definition is intentional, but not documented in the right place, i.e., http://www.oxygenxml.com/doc/versions/1 ... sions.html. Or am I off somewhere?
As far as I can tell from the official definition, http://www.w3.org/TR/xmlschema-2/#charcter-classes, the XPath is right, since U+00C7 is tagged as Sm, which is not excluded from the class \w.
I would think that the oXygen search mechanism is wrong, or else the departure from the W3C definition is intentional, but not documented in the right place, i.e., http://www.oxygenxml.com/doc/versions/1 ... sions.html. Or am I off somewhere?
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Regular expressions discrepancy
Hi,
Yes, that is correct, the regular expression syntax accepted in XPath/XML Schema/Schematron is slightly different than the one used by Oxygen in the text searches from the Find/Replace dialogs.
Oxygen uses in the the Find/Replace dialogs Java regular expression syntax, which is based on Perl 5 regex, with some differences:
https://www.oxygenxml.com/doc/versions/ ... sions.html
Click on the link at the bottom and you should find on that page the definition of \w and \W:
http://docs.oracle.com/javase/6/docs/ap ... tml#predef
The regular expression syntax from XPath functions is also based on Perl syntax, but it uses the XML Schema regex as the base, so there are some differences:
http://www.w3.org/TR/xpath-functions/#regex-syntax
This is the XML Schema "Regular Expressions" glossary:
http://www.w3.org/TR/xmlschema-2/#dt-ccesN
(look above the G Glossary for the definition of \W and \w)
So, in short, \w and \W are significantly different in the two implementations. If you want to have consistent results between the two, you should use in Oxygen [\p{P}\p{Z}\p{C}] instead of \W.
Regards,
Adrian
Yes, that is correct, the regular expression syntax accepted in XPath/XML Schema/Schematron is slightly different than the one used by Oxygen in the text searches from the Find/Replace dialogs.
Oxygen uses in the the Find/Replace dialogs Java regular expression syntax, which is based on Perl 5 regex, with some differences:
https://www.oxygenxml.com/doc/versions/ ... sions.html
Click on the link at the bottom and you should find on that page the definition of \w and \W:
http://docs.oracle.com/javase/6/docs/ap ... tml#predef
Code: Select all
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
http://www.w3.org/TR/xpath-functions/#regex-syntax
This is the XML Schema "Regular Expressions" glossary:
http://www.w3.org/TR/xmlschema-2/#dt-ccesN
(look above the G Glossary for the definition of \W and \w)
Code: Select all
\w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters)
\W [^\w]
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 93
- Joined: Fri Mar 08, 2013 8:58 am
Re: Regular expressions discrepancy
Thank you for the thorough background. Would you be willing, next time you update the documentation, to include a modified form of this discussion in the material about regular expressions? The escape classes \w and \W are widely used, and a bit more prominence to the issue in oXygen documentation would be helpful to users. Thanks!
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Regular expressions discrepancy
Hi,
I've already submitted an issue for our documentation department to include this in the manual. I forgot to mention this in my previous post.
Regards,
Adrian
I've already submitted an issue for our documentation department to include this in the manual. I forgot to mention this in my previous post.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service