regex in find replace
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 8
- Joined: Thu Jun 30, 2016 3:00 pm
regex in find replace
Dear all,
I have a file where I need to replace some information with tags using regex in the find replace dialogue. I am not able to get the regex however...
The text is like this.
I need to match all what follows A) and all what follows B). in regexer I found that this expression works fine
the second
I tried to convert it to the required regex dialect for oXygen, but without success.
I have changed [A-Z] to \p{Upper} as indicated in another forum post and added (?s) to make the match not greedy and restricted the path to the sense element.
as first match
and second match
I do not understand where is my mistake. I have tried changing the greediness without success.
thanks a lot for any advise or help on how to make this work!
I have a file where I need to replace some information with tags using regex in the find replace dialogue. I am not able to get the regex however...
The text is like this.
Code: Select all
<sense>pron. rel. et conj. rel. (gramm. § 147; gramm. § 169,5).
A) Pron. rel.: Sing. m. ዘ፡, fem. እንተ፡, Pl. c. እለ፡ <i>qui</i>, <i>quae</i>,
<i>quod</i>. 1) De constructione hujus pronominis B) Sx. Sen. 7 Enc.
</sense>
the first match is((\s)([A-Z])(\))(\s))(.*?)(?=((\s)([A-Z])(\)(\s))|$))
Code: Select all
A) Pron. rel.: Sing. m. ዘ፡, fem. እንተ፡, Pl. c. እለ፡ <i>qui</i>, <i>quae</i>,
<i>quod</i>. 1) De constructione hujus pronominis
Code: Select all
B) Sx. Sen. 7 Enc.
I have changed [A-Z] to \p{Upper} as indicated in another forum post and added (?s) to make the match not greedy and restricted the path to the sense element.
nevertheless this does not work, and I get only((\s)(\p{Upper})(\))(\s))((?s).*?)(?=((\s)(\p{Upper})(\)(\s))|$))
as first match
Code: Select all
A) Pron. rel.: Sing. m. ዘ፡, fem. እንተ፡, Pl. c. እለ፡ <i>qui</i>, <i>quae</i>,
Code: Select all
B) Sx. Sen. 7 Enc.
or((\s)(\p{Upper})(\))(\s))((?s).*)(?=((\s)(\p{Upper})(\)(\s))|$))
both return one match only((\s)(\p{Upper})(\))(\s))(.*?)(?=((\s)(\p{Upper})(\)(\s))|$))
Code: Select all
A) Pron. rel.: Sing. m. ዘ፡, fem. እንተ፡, Pl. c. እለ፡ <i>qui</i>, <i>quae</i>,
<i>quod</i>. 1) De constructione hujus pronominis B) Sx. Sen. 7 Enc.
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: regex in find replace
Hi,
I started with your initial regexp and cleaned up the redundant parenthesis (kept just a few):
First thing to note is that in Oxygen '.' (dot) matches any character except line terminators. You can make it match everything, by checking the option "Dot matches all", or you can add at the beginning of your expression the flag (?s).
The problem however is the '$' (dollar) in the expression. Some regexp engines interpret $ as EOF, others as EOL. Oxygen is in the latter category (EOL).
Since your match is lazy (.*?) it stops after finding the shortest match (ending in EOL).
I don't have a proper solution for this one. The problem is you want your first match to span across multiple lines, ignoring the line terminator, but want your second match to end at the line terminator. As far as I can tell, you can't have it both ways. With a lazy match either all matches end at line terminators or none do (I picked the latter).
What you can do is use \z (end of input) instead of $ (EOL) and you get this: This finds the first match as you expect, but the second spans all the way to the end of the file (includes the end tag).
Regards,
Adrian
I started with your initial regexp and cleaned up the redundant parenthesis (kept just a few):
Code: Select all
\s[A-Z]\)\s(.*?)(?=(\s[A-Z]\)\s)|$)
The problem however is the '$' (dollar) in the expression. Some regexp engines interpret $ as EOF, others as EOL. Oxygen is in the latter category (EOL).
Since your match is lazy (.*?) it stops after finding the shortest match (ending in EOL).
I don't have a proper solution for this one. The problem is you want your first match to span across multiple lines, ignoring the line terminator, but want your second match to end at the line terminator. As far as I can tell, you can't have it both ways. With a lazy match either all matches end at line terminators or none do (I picked the latter).
What you can do is use \z (end of input) instead of $ (EOL) and you get this:
Code: Select all
(?s)\s[A-Z]\)\s(.*?)((?=(\s[A-Z]\)\s)|\z))
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service