Regular expression matches but can't replace

Having trouble installing Oxygen? Got a bug to report? Post it all here.
toby
Posts: 6
Joined: Thu Mar 19, 2009 3:37 pm
Location: Tübingen, Germany
Contact:

Regular expression matches but can't replace

Post by toby »

hi,
I'm using oXygen 9.3 and have a problem with the following regular expression

Code: Select all

(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)([ \t]*)(\n|\r|(\r\n))+([ \t]*)(?=</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)
short explanation:
I'm searching for XML-tags followd by optional space or tab followed by required line-break in any form followd by optional space or tab followed by a XML-tag.
The last XML-Tag is found by a lookahead-expression (?=...)

Try the regexp-search with the following code-snippet:

Code: Select all


      <st:seite nr="2">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Gesammelte Werke<st:br/><ty:zw/> in

Einzelbänden<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="29lz"/>S. Fischer Verlag<ty:zw/></st:abs>

</st:einschub>

</st:seite>

<st:seite nr="3">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Maria Stuart<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="29lz"/>S. Fischer Verlag<ty:zw/></st:abs>

</st:einschub>

</st:seite>
A "normal" search matches 13 times.
Everything's great!

But if I want to replace all the line-breaks, tabs and spaces with

Code: Select all

$1
- the first XML-tag (to create a "one-line-XML" (don't ask why ;-) ))
I get the error:
wrong syntax:
match not possible
Can someone help me?

thanks!
-- toby
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Regular expression matches but can't replace

Post by sorin_ristache »

Hello,
toby wrote:The last XML-Tag is found by a lookahead-expression (?=...)
That is not supported. You have to remove ?= from the expression:

Code: Select all

(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)([ \t]*)(\n|\r|(\r\n))+([ \t]*)(</[^<]+?>|<[^/<]+?>|<[^/][^<]+?\/>)
and use the replace expression $1$6 (only the first XML tag and the second XML tag without the whitespaces between them).


Regards,
Sorin
toby
Posts: 6
Joined: Thu Mar 19, 2009 3:37 pm
Location: Tübingen, Germany
Contact:

Re: Regular expression matches but can't replace

Post by toby »

hi sorin,

thanks for your reply.

but why is the lookahead-expression supported for searching but not for replacing? I don't get it...

Meanwhile, I figured out the same solution as yours, but there's still a thing which disturbes me with this:
I always have to run the Code twice oder thrice in order to replace all the matches.

Try the expression with the xml-snippet above and you'll hopefully see what I mean.

Just take the first 4 Lines (+ empty lines)

Code: Select all


      <st:seite nr="2">

<st:einschub typ="2">

<st:abs ausricht="zentriert">Stefan Zweig<ty:zw/></st:abs>

<st:abs ausricht="zentriert"><ty:abstand wert="3lz"/>Gesammelte Werke<st:br/><ty:zw/> in
The first match will be the first two lines and remove the spaces and breaks, the second match will be the last two lines. But to remove the spaces and breaks between line 2 and 3 I have to run the script a second time.

Do you see another solution for that?
Or do you see a chance to support lookahead / lookbehind - expressions in a further release of oXygen?
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Regular expression matches but can't replace

Post by sorin_ristache »

Yes, lookahead is supported for searching but not for replacing. We will look into that.

The find/replace of Oxygen does not overlap the search results because that can lead to infinite loops, for example when replacing ab with aab. That means the second XML tag of a search result cannot be the first XML tag of the next search result because that would mean overlapping the two search results. The search is always resumed from the position where the previous search result ends.


Regards,
Sorin
toby
Posts: 6
Joined: Thu Mar 19, 2009 3:37 pm
Location: Tübingen, Germany
Contact:

Re: Regular expression matches but can't replace

Post by toby »

okay, I see...
thanks for your effort :-)
Post Reply