odd regex behavior

Having trouble installing Oxygen? Got a bug to report? Post it all here.
RIH
Posts: 3
Joined: Mon Jun 16, 2014 6:08 pm

odd regex behavior

Post by RIH »

I'm running XQuery 3.1 with oXygen 26.1 / Saxon EE 12.3 and getting some surprising behavior when using regular expressions. It seems that the regex engine is treating optional capturing groups as non-capturing groups? Am I missing something?

To reproduce: In either the XPath dialog or the XPath/XQuery Builder dialog or in an XQuery script, with XPath/XQuery version set to 3.1 in either case, do

Code: Select all

replace("abc1234-5678abcd", "(.+)(\d{4})(\-\d{4})(.*)", "$2", "i")
which returns the expected "1234". But when you make one of the groups optional and try to return it, e.g.

Code: Select all

replace("abc1234-5678abcd", "(.+)(\d{4})?(\-\d{4})(.*)", "$2", "i")
it returns blank. I expected it to return "1234", since a match is present. It strikes me as a bug, but my apologies if I'm just thinking about the regex wrong!
teo
Posts: 81
Joined: Wed Aug 30, 2017 3:56 pm

Re: odd regex behavior

Post by teo »

Hello,

It seems you can get a detailed explanation of the reported case by invoking the help of AI.
Copy the message you posted above and then paste it into the Chat GPT window, for example.
The response also includes a fix/workaround proposal.

Additional note: the response received from Chat GPT is very well HTML formatted, easy to read and understand.
I would have lost some of its clarity if I had posted it directly here.

Regards,
Teo
Teodor Timplaru
<oXygen/> XML Editor
http://www.oxygenxml.com
RIH
Posts: 3
Joined: Mon Jun 16, 2014 6:08 pm

Re: odd regex behavior

Post by RIH »

Thank you. AI says to use lookahead. I understand that's a possibility. Still though, it seems something has changed in the regex engine for it to not capture optional capturing groups if the pattern is present?

Update for the possible benefit of others: Answering my own question with the excellent guidance on regular-expressions.info:

oXygen does support backreferences to non-participating capturing groups (in fact, with or without the Saxon 'j' flag per my experimentation).

As to my problem, I needed to make the quantifier in non-optional capturing group 1 lazy, else it consumes the characters I expected to be matched by optional capturing group 2.

Thanks again for your assistance.
Post Reply