Need workaround for lookarounds in XPATH regex

Questions about XML that are not covered by the other forums should go here.
xephon
Posts: 95
Location: Hamburg/Germany

Need workaround for lookarounds in XPATH regex

Wed Mar 08, 2017 2:15 pm

Hi,

it seems, that the <sqf:stringReplace/> regular expressions (in XSLT/Schematron) do not support lookarounds. I need to find a solution for matching words, that are not part of another string. Without lookarounds (e.g. foo(?=\s)), this seems to get very ugly.

e.g., when trying to match 'foo', this should match:

Code: Select all

lorem ipsum foo bar.


but this should not match, because 'foo' is a substring of 'foobar'.

Code: Select all

lorem ipsum foobar.


I workaround this by also matching the preceding and following chars. But this is really dirty, but I cannot find a better solution. Do you have an implementation idea for me?

<sqf:fix id="replace-foo">
<sqf:description>
<sqf:title>Replace 'foo' with an allowed term: 'bar'</sqf:title>
</sqf:description>
<sqf:stringReplace regex="(\sfoo$)" select="' bar'"/>
<sqf:stringReplace regex="(^foo\s)" select="'bar '"/>
<sqf:stringReplace regex="(^foo$)" select="'bar'"/>
<sqf:stringReplace regex="(\sfoo\s)" select="' bar '"/>
<sqf:stringReplace regex="(\sfoo\.)|(^foo\.)" select="'bar.'"/>
<sqf:stringReplace regex="(\sfoo\?)|(^foo\?)" select="'bar?'"/>
<sqf:stringReplace regex="(\sfoo\!)|(^foo\!)" select="'bar!'"/>
<sqf:stringReplace regex="(\sfoo\;)|(^foo\;)" select="'bar;'"/>
<sqf:stringReplace regex="(\sFoo$)" select="' Bar'"/>
<sqf:stringReplace regex="(^Foo\s)" select="'Bar '"/>
<sqf:stringReplace regex="(^Foo$)" select="'Bar'"/>
<sqf:stringReplace regex="(\sFoo\s)" select="' Bar '"/>
<sqf:stringReplace regex="(\sFoo\.)|(^Foo\.)" select="'Bar.'"/>
<sqf:stringReplace regex="(\sFoo\?)|(^Foo\?)" select="'Bar?'"/>
<sqf:stringReplace regex="(\sFoo\!)|(^Foo\!)" select="'Bar!'"/>
<sqf:stringReplace regex="(\sFoo\;)|(^Foo\;)" select="'Bar;'"/>
</sqf:fix>
Join the DOCTALES DITA development team :arrow: https://doctales.atlassian.net/wiki
tavy
Posts: 122

Re: Need workaround for lookarounds in XPATH regex

Wed Mar 08, 2017 6:15 pm

Hello,

In <oXygen/> XML Editor 18.1 we updated the sqf:stringReplace operation to allow using Java regular expressions. The Saxon "j" flag is automatically set for the regular expressions of the sqf:stringReplace operation.
You can read more about this in our user manual: https://www.oxygenxml.com/doc/versions/18.1/ug-editor/topics/sqf-operations.html

This will allow you to use "\b" in a regular expression to match word boundaries.
So you can create a quick fix something like this:

Code: Select all

<sqf:fix id="replace-foo">
     <sqf:description>
         <sqf:title>Replace 'foo' with an allowed term: 'bar'</sqf:title>
     </sqf:description>
     <sqf:stringReplace regex="\b(foo)\b" select="'bar'"/>
     <sqf:stringReplace regex="\b(Foo)\b" select="'Bar'"/>
</sqf:fix>


Best Regards,
Octavian
Octavian Nadolu
<oXygen/> XML Editor
http://www.oxygenxml.com
xephon
Posts: 95
Location: Hamburg/Germany

Re: Need workaround for lookarounds in XPATH regex

Wed Mar 08, 2017 7:44 pm

When I feel lost, you always pull a rabbit out of a hat. :wink:
Join the DOCTALES DITA development team :arrow: https://doctales.atlassian.net/wiki
xephon
Posts: 95
Location: Hamburg/Germany

Re: Need workaround for lookarounds in XPATH regex

Thu Mar 09, 2017 9:11 am

It works perfectly.

You made my day, thanks a lot. :D
Join the DOCTALES DITA development team :arrow: https://doctales.atlassian.net/wiki

Return to “General XML Questions”

Who is online

Users browsing this forum: No registered users and 0 guests