Page 1 of 1

Need workaround for lookarounds in XPATH regex

Posted: Wed Mar 08, 2017 2:15 pm
by xephon
Hi,

it seems, that the <sqf:stringReplace/> regular expressions (in XSLT/Schematron) do not support lookarounds. I need to find a solution for matching words, that are not part of another string. Without lookarounds (e.g. foo(?=\s)), this seems to get very ugly.

e.g., when trying to match 'foo', this should match:

Code: Select all

lorem ipsum foo bar.
but this should not match, because 'foo' is a substring of 'foobar'.

Code: Select all

lorem ipsum foobar.
I workaround this by also matching the preceding and following chars. But this is really dirty, but I cannot find a better solution. Do you have an implementation idea for me?
<sqf:fix id="replace-foo">
<sqf:description>
<sqf:title>Replace 'foo' with an allowed term: 'bar'</sqf:title>
</sqf:description>
<sqf:stringReplace regex="(\sfoo$)" select="' bar'"/>
<sqf:stringReplace regex="(^foo\s)" select="'bar '"/>
<sqf:stringReplace regex="(^foo$)" select="'bar'"/>
<sqf:stringReplace regex="(\sfoo\s)" select="' bar '"/>
<sqf:stringReplace regex="(\sfoo\.)|(^foo\.)" select="'bar.'"/>
<sqf:stringReplace regex="(\sfoo\?)|(^foo\?)" select="'bar?'"/>
<sqf:stringReplace regex="(\sfoo\!)|(^foo\!)" select="'bar!'"/>
<sqf:stringReplace regex="(\sfoo\;)|(^foo\;)" select="'bar;'"/>
<sqf:stringReplace regex="(\sFoo$)" select="' Bar'"/>
<sqf:stringReplace regex="(^Foo\s)" select="'Bar '"/>
<sqf:stringReplace regex="(^Foo$)" select="'Bar'"/>
<sqf:stringReplace regex="(\sFoo\s)" select="' Bar '"/>
<sqf:stringReplace regex="(\sFoo\.)|(^Foo\.)" select="'Bar.'"/>
<sqf:stringReplace regex="(\sFoo\?)|(^Foo\?)" select="'Bar?'"/>
<sqf:stringReplace regex="(\sFoo\!)|(^Foo\!)" select="'Bar!'"/>
<sqf:stringReplace regex="(\sFoo\;)|(^Foo\;)" select="'Bar;'"/>
</sqf:fix>

Re: Need workaround for lookarounds in XPATH regex

Posted: Wed Mar 08, 2017 6:15 pm
by tavy
Hello,

In <oXygen/> XML Editor 18.1 we updated the sqf:stringReplace operation to allow using Java regular expressions. The Saxon "j" flag is automatically set for the regular expressions of the sqf:stringReplace operation.
You can read more about this in our user manual: https://www.oxygenxml.com/doc/versions/ ... tions.html

This will allow you to use "\b" in a regular expression to match word boundaries.
So you can create a quick fix something like this:

Code: Select all


<sqf:fix id="replace-foo">
<sqf:description>
<sqf:title>Replace 'foo' with an allowed term: 'bar'</sqf:title>
</sqf:description>
<sqf:stringReplace regex="\b(foo)\b" select="'bar'"/>
<sqf:stringReplace regex="\b(Foo)\b" select="'Bar'"/>
</sqf:fix>
Best Regards,
Octavian

Re: Need workaround for lookarounds in XPATH regex

Posted: Wed Mar 08, 2017 7:44 pm
by xephon
When I feel lost, you always pull a rabbit out of a hat. :wink:

Re: Need workaround for lookarounds in XPATH regex

Posted: Thu Mar 09, 2017 9:11 am
by xephon
It works perfectly.

You made my day, thanks a lot. :D