Regular expression issue

Having trouble installing Oxygen? Got a bug to report? Post it all here.
martindholmes
Posts: 178
Joined: Wed Apr 20, 2005 5:43 pm
Location: Victoria, BC, Canada

Regular expression issue

Post by martindholmes »

Hi there,

I'm trying to find TEI <choice> tags with a particular configuration (where there is a long s in the <orig> tag and no long s in the <reg>. I have a regex like this:

<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>

But this fails to find anything. If I replace the second instance of "ſ" with another character, such as "a", it successfully finds things:

<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^a]+\s*</reg>\s*</choice>

In other words, it finds a tag like this:

<choice><orig>Reſte</orig><reg>Reste</reg></choice>

where there is an "ſ" in the <orig>, and no "a" in the <reg>. However, neither this:

<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>

nor the unicode-numeric-escape version:

<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^\u017f]+\s*</reg>\s*</choice>

will find anything. It seems as though the "ſ" works OK when it's in a positive match, but fails when it's part of a negated character class. Am I missing something here, or is this a bug in the regex engine?

I'm using "Find in files", and the tags typically do not run over multiple lines. I have Oxygen 13.2 2012030716 running on Ubuntu Lucid 64-bit.
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: Regular expression issue

Post by adrian »

Hello,

Apologies for the late reply.

The problem is that ſ("long s") is equivalent with "s" when configured to be case insensitive(default). This happens because Oxygen configures the regular expression engine to be Unicode case insensitive.
e.g. "s", "S" and all Unicode derivative characters of "s" are considered equivalent when the search is case insensitive.

This can be easily resolved by enabling the "Case sensitive" option in the Oxygen dialog.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply