Page 1 of 1

Regular expression matches but can't replace

Posted: Thu Mar 19, 2009 3:51 pm
by toby
hi,
I'm using oXygen 9.3 and have a problem with the following regular expression

Code: Select all

(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)([ \t]*)(\n|\r|(\r\n))+([ \t]*)(?=</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)
short explanation:
I'm searching for XML-tags followd by optional space or tab followed by required line-break in any form followd by optional space or tab followed by a XML-tag.
The last XML-Tag is found by a lookahead-expression (?=...)

Try the regexp-search with the following code-snippet:

Code: Select all


      <st:seite nr="2">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Gesammelte Werke<st:br/><ty:zw/> in

Einzelbänden<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="29lz"/>S. Fischer Verlag<ty:zw/></st:abs>

</st:einschub>

</st:seite>

<st:seite nr="3">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Maria Stuart<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="29lz"/>S. Fischer Verlag<ty:zw/></st:abs>

</st:einschub>

</st:seite>
A "normal" search matches 13 times.
Everything's great!

But if I want to replace all the line-breaks, tabs and spaces with

Code: Select all

$1
- the first XML-tag (to create a "one-line-XML" (don't ask why ;-) ))
I get the error:
wrong syntax:
match not possible
Can someone help me?

thanks!
-- toby

Re: Regular expression matches but can't replace

Posted: Thu Mar 19, 2009 6:10 pm
by sorin_ristache
Hello,
toby wrote:The last XML-Tag is found by a lookahead-expression (?=...)
That is not supported. You have to remove ?= from the expression:

Code: Select all

(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)([ \t]*)(\n|\r|(\r\n))+([ \t]*)(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)
and use the replace expression $1$6 (only the first XML tag and the second XML tag without the whitespaces between them).


Regards,
Sorin

Re: Regular expression matches but can't replace

Posted: Thu Mar 19, 2009 6:25 pm
by toby
hi sorin,

thanks for your reply.

but why is the lookahead-expression supported for searching but not for replacing? I don't get it...

Meanwhile, I figured out the same solution as yours, but there's still a thing which disturbes me with this:
I always have to run the Code twice oder thrice in order to replace all the matches.

Try the expression with the xml-snippet above and you'll hopefully see what I mean.

Just take the first 4 Lines (+ empty lines)

Code: Select all


      <st:seite nr="2">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Gesammelte Werke<st:br/><ty:zw/> in
The first match will be the first two lines and remove the spaces and breaks, the second match will be the last two lines. But to remove the spaces and breaks between line 2 and 3 I have to run the script a second time.

Do you see another solution for that?
Or do you see a chance to support lookahead / lookbehind - expressions in a further release of oXygen?

Re: Regular expression matches but can't replace

Posted: Thu Mar 19, 2009 6:49 pm
by sorin_ristache
Yes, lookahead is supported for searching but not for replacing. We will look into that.

The find/replace of Oxygen does not overlap the search results because that can lead to infinite loops, for example when replacing ab with aab. That means the second XML tag of a search result cannot be the first XML tag of the next search result because that would mean overlapping the two search results. The search is always resumed from the position where the previous search result ends.


Regards,
Sorin

Re: Regular expression matches but can't replace

Posted: Fri Mar 20, 2009 3:09 pm
by toby
okay, I see...
thanks for your effort :-)