Page 1 of 1

Regex not working properly in Find and Replace

Posted: Thu Nov 21, 2019 6:40 pm
by Gunnar
In trying to find and replace a simple HTML code snippet in an EPUB archive which has different text in the <p> container. Using Regex I can do this easily in two other editors (UltraEdit and Visual Studio Code) but Oxygen has issues that make this impossible.

What I'm trying to do is add <b> around the image description text for each image (a few hundred images in multiple files inside an EPUB archive):

Original code:
<p>Image Description, different in each case</p>
<figure class="image"><img alt="image" src="image.jpg"/>
</figure>

Desired result:
<p><b>Image Description, different in each case</b></p>
<figure class="image"><img alt="image" src="images.jpg"/>
</figure>

I'm not a Regex guru in any way but the code below works, as said above, in UltraEdit and Visual Studio Code.

The Regex I used is this:

Find:
<p>(.*)</p>
<figure class="image">

Replace:
<p><b>$1</b></p>
<figure class="image">

What happens in Oxygen is that it finds the first <p> in the code and then ends with the last <figure class="image"> code which of course makes this totally unworkable. Screenshot from Oxygen: https://imgur.com/7HB2okk

This behaves predictably (to me) in UltraEdit, see screenshot: https://imgur.com/hbfnKH0

Is this a bug in Oxygen or do I somehow need to tailor my Regex towards the Oxygen implementation of Regex? Or should I be using some other (Regex?) method?

P.S: I know that Regex generally is not good for HTML but I've used it in the past for simple tasks like this with good results.

Re: Regex not working properly in Find and Replace

Posted: Fri Nov 22, 2019 1:09 pm
by adrian
Hello,

That happens because you're using a greedy pattern, (.*). This matches as much as possible and you've probably also checked the option "Dot matches all" which makes it greedy across multiple lines.
So, use the lazy pattern (.*?) instead of (.*), or clear the box for "Dot matches all". I would recommend the former, but also be aware of the latter.

Regards,
Adrian

Re: Regex not working properly in Find and Replace

Posted: Wed Nov 27, 2019 6:59 pm
by Gunnar
Hi Adrian

Thank you, this works perfectly!

I had to do both, use this pattern:

Find:
<p>(.*?)</p>
<figure class="image">

and also uncheck the "Dot matches all"

My best
Gunnar