Page 1 of 1

non-greedy regexp in Oxygen

Posted: Wed Jan 02, 2013 6:59 pm
by xinelo
Hi there,

Is it possible to use lazy/non-greedy regexp match in Oxygen?

I'm trying to match elements "trans-unit" which contain a sub-element "target" which contains the text "REMOVE", for example in:

Code: Select all

<trans-unit translate="yes" id="114" reformat="yes" xml:space="default">
<source>15.7 x 16.5 x 6.7in</source><target state="translated" state-qualifier="exact-match">"REMOVE"</target>
</trans-unit>
For that I have successfully used the expression:

Code: Select all

<trans-unit translate="yes".+?\n.+?"REMOVE"
In the sample code, you can see that there is one newline between the trans-unit opening tag and the source opening tag, hence the \n in my expression.

Code: Select all

... xml:space="default"><-- NEWLINE HERE
<source>...
However, there might be more newlines (in an undetermined quantity) somewhere else in the text I want to match, and to avoid having to use more \n in my regexp I would like to use the expression:

Code: Select all

<trans-unit translate="yes".+?"REMOVE"
with option "Dot matches all" checked. The ? in the expression should make it non-greedy, however I'm getting many other full trans-unit elements matched by my .+ bit.

Hence my question: Is it possible to use lazy/non-greedy regular expressions in Oxygen? If it is, what am I doing wrong?

Thank you very much!

Cheers, Manuel

Re: non-greedy regexp in Oxygen

Posted: Thu Jan 03, 2013 6:47 pm
by adrian
Hi,

What version of Oxygen are you using (look in Help > About)?

I've tested in the Find/Replace dialog from v14.1 with the expression you have provided (with Dot matches all) and it seems to work as expected:

Code: Select all

<trans-unit translate="yes".+?"REMOVE"
This is non-greedy and matches only one "trans-unit" at a time.

If I use

Code: Select all

<trans-unit translate="yes".+"REMOVE"
without the "?", then it gets greedy and the match spans from the first "trans-unit" to the last "REMOVE" text from the document.

Regards,
Adrian

Re: non-greedy regexp in Oxygen

Posted: Thu Jan 03, 2013 8:25 pm
by xinelo
Thank you very much for your answer, Adrian.

I am using version 14.1, build 201212121012 <- I could have waited a couple of hours... ;)

If you paste this code in a document:

Code: Select all

<trans-unit translate="yes" id="114" reformat="yes" xml:space="default">
<source>15.7 x 16.5 x 6.7in</source><target state="translated" state-qualifier="exact-match">"REMOVE"</target>
</trans-unit>
<trans-unit translate="yes" id="114" reformat="yes" xml:space="default">
<source>15.7 x 16.5 x 6.7in</source><target state="translated" state-qualifier="exact-match">asdfa</target>
</trans-unit>
<trans-unit translate="yes" id="114" reformat="yes" xml:space="default">
<source>15.7 x 16.5 x 6.7in</source><target state="translated" state-qualifier="exact-match">"adf"</target>
</trans-unit>
<trans-unit translate="yes" id="114" reformat="yes" xml:space="default">
<source>15.7 x 16.5 x 6.7in</source><target state="translated" state-qualifier="exact-match">"REMOVE"</target>
</trans-unit>
What do you get matched with the non-greedy expression?

Code: Select all

<trans-unit translate="yes".+?"REMOVE"
I get two matches: (1) the first trans-unit element, (2) the second, third and fourth trans-unit elements together. The expected result is two matches: (1) the first trans-unit element, (2) the fourth trans-unit element.

Am I missing something?

Cheers,

Re: non-greedy regexp in Oxygen

Posted: Thu Jan 03, 2013 9:21 pm
by adrian
Hi,

Yes, that's exactly what happens for this particular searched content. This is normal considering the expression you are using and the content you have. My test had a REMOVE string in each trans-unit so it found each one in turn.
The non-greedy expression doesn't mean it will search for the shortest possible match disregarding the start location (I believe that's what you're expecting, but the one you expect is the farthest), it means it will search for the first match starting from the current search position (or from the end of the previous match).

You can fine-tune this with XPath if you don't want the result to span across several trans-unit elements.
Use in the XPath field: //trans-unit

Regards,
Adrian

Re: non-greedy regexp in Oxygen

Posted: Fri Jan 04, 2013 10:04 pm
by xinelo
Thank you very much, Adrian, for that lucid explanation. It seems my expectations were was on a misunderstanding of what non-greedy matching does.

Restricting the scope in the XPath field works fine. Thanks a lot! :)

Cheers, Manuel