Page 1 of 1

Regular expression case sensitivity

Posted: Fri May 23, 2025 1:24 pm
by Frank Ralf
Hi,

I am using the regex [a-z]-[a-z] to find unnecessary hyphens in text that was copy and pasted. The usual regex behavior should be to only match lower-case letters. However, the Oxygen search also matches "Baden-Baden" unless I also check the "Case sensitive" option. Is that the intended behavior or a bug?

Best regards,
Frank

Re: Regular expression case sensitivity

Posted: Fri May 23, 2025 3:24 pm
by adrian
Hi,

This is going to get a little technical, but, in short, it is as intended. I know it seems a bit off, but I'll explain...

Due to the way the Java regex pattern compiler works (default case sensitive) and the binary nature of the "Case sensitive" UI option, the search can either be case sensitive or case insensitive. There is no neutral position for this option, so as long as the box is cleared, it won't matter if the regex character range is of uppercase or lowercase letters, the Java regex pattern compiler is set up to be case insensitive (includes Unicode case insensitive). So the behavior that you are expecting only works with "Case sensitive" set.

Regards,
Adrian

Re: Regular expression case sensitivity

Posted: Fri May 23, 2025 3:29 pm
by Frank Ralf
Hi Adrian,

Many thanks for your quick reply and the explanation. So the Java regex engine is the culprit. Good to be reminded that not all regex engines are created equal ;-)

Best regards,
Frank

Re: Regular expression case sensitivity

Posted: Fri May 23, 2025 4:52 pm
by adrian
We can't quite blame the regex engine. The engine and the UI are hand in hand here.
The UI is thought out so that the average user can just search for something and have the best chance to get a result, so our default search is meant to be case insensitive (box cleared). If you want to fine tune it, you have to opt in for a more accurate search.

Regards,
Adrian

Re: Regular expression case sensitivity

Posted: Fri May 23, 2025 4:55 pm
by Frank Ralf
Point taken ;-)