Regular expression issue
Posted: Fri Mar 30, 2012 7:25 pm
Hi there,
I'm trying to find TEI <choice> tags with a particular configuration (where there is a long s in the <orig> tag and no long s in the <reg>. I have a regex like this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
But this fails to find anything. If I replace the second instance of "ſ" with another character, such as "a", it successfully finds things:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^a]+\s*</reg>\s*</choice>
In other words, it finds a tag like this:
<choice><orig>Reſte</orig><reg>Reste</reg></choice>
where there is an "ſ" in the <orig>, and no "a" in the <reg>. However, neither this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
nor the unicode-numeric-escape version:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^\u017f]+\s*</reg>\s*</choice>
will find anything. It seems as though the "ſ" works OK when it's in a positive match, but fails when it's part of a negated character class. Am I missing something here, or is this a bug in the regex engine?
I'm using "Find in files", and the tags typically do not run over multiple lines. I have Oxygen 13.2 2012030716 running on Ubuntu Lucid 64-bit.
I'm trying to find TEI <choice> tags with a particular configuration (where there is a long s in the <orig> tag and no long s in the <reg>. I have a regex like this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
But this fails to find anything. If I replace the second instance of "ſ" with another character, such as "a", it successfully finds things:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^a]+\s*</reg>\s*</choice>
In other words, it finds a tag like this:
<choice><orig>Reſte</orig><reg>Reste</reg></choice>
where there is an "ſ" in the <orig>, and no "a" in the <reg>. However, neither this:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^ſ]+\s*</reg>\s*</choice>
nor the unicode-numeric-escape version:
<choice>\s*<orig>(\w*ſ\w*)\s*</orig>\s*<reg>\s*[^\u017f]+\s*</reg>\s*</choice>
will find anything. It seems as though the "ſ" works OK when it's in a positive match, but fails when it's part of a negated character class. Am I missing something here, or is this a bug in the regex engine?
I'm using "Find in files", and the tags typically do not run over multiple lines. I have Oxygen 13.2 2012030716 running on Ubuntu Lucid 64-bit.