Regular expression issue
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 190
- Joined: Wed Apr 20, 2005 5:43 pm
- Location: Victoria, BC, Canada
Regular expression issue
Post by martindholmes »
Hi there,
I'm trying to find TEI <choice> tags with a particular configuration (where there is a long s in the <orig> tag and no long s in the <reg>. I have a regex like this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
But this fails to find anything. If I replace the second instance of "ſ" with another character, such as "a", it successfully finds things:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^a]+\s*</reg>\s*</choice>
In other words, it finds a tag like this:
<choice><orig>Reſte</orig><reg>Reste</reg></choice>
where there is an "ſ" in the <orig>, and no "a" in the <reg>. However, neither this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
nor the unicode-numeric-escape version:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^\u017f]+\s*</reg>\s*</choice>
will find anything. It seems as though the "ſ" works OK when it's in a positive match, but fails when it's part of a negated character class. Am I missing something here, or is this a bug in the regex engine?
I'm using "Find in files", and the tags typically do not run over multiple lines. I have Oxygen 13.2 2012030716 running on Ubuntu Lucid 64-bit.
I'm trying to find TEI <choice> tags with a particular configuration (where there is a long s in the <orig> tag and no long s in the <reg>. I have a regex like this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
But this fails to find anything. If I replace the second instance of "ſ" with another character, such as "a", it successfully finds things:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^a]+\s*</reg>\s*</choice>
In other words, it finds a tag like this:
<choice><orig>Reſte</orig><reg>Reste</reg></choice>
where there is an "ſ" in the <orig>, and no "a" in the <reg>. However, neither this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
nor the unicode-numeric-escape version:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^\u017f]+\s*</reg>\s*</choice>
will find anything. It seems as though the "ſ" works OK when it's in a positive match, but fails when it's part of a negated character class. Am I missing something here, or is this a bug in the regex engine?
I'm using "Find in files", and the tags typically do not run over multiple lines. I have Oxygen 13.2 2012030716 running on Ubuntu Lucid 64-bit.
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Regular expression issue
Hello,
Apologies for the late reply.
The problem is that ſ("long s") is equivalent with "s" when configured to be case insensitive(default). This happens because Oxygen configures the regular expression engine to be Unicode case insensitive.
e.g. "s", "S" and all Unicode derivative characters of "s" are considered equivalent when the search is case insensitive.
This can be easily resolved by enabling the "Case sensitive" option in the Oxygen dialog.
Regards,
Adrian
Apologies for the late reply.
The problem is that ſ("long s") is equivalent with "s" when configured to be case insensitive(default). This happens because Oxygen configures the regular expression engine to be Unicode case insensitive.
e.g. "s", "S" and all Unicode derivative characters of "s" are considered equivalent when the search is case insensitive.
This can be easily resolved by enabling the "Case sensitive" option in the Oxygen dialog.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service