RegEx to find text in a map

Post here questions and problems related to editing and publishing DITA content.
psbentley
Posts: 20
Joined: Mon Aug 04, 2014 5:18 pm

RegEx to find text in a map

Post by psbentley »

Hi folks,

I'm terrible at RegEx and I'm hoping someone here can help me. I have a bunch of maps that have topicrefs that contain two elements, a difficulty and a topicsubject, like this:

Code: Select all

<topicref href="../questions/cfaL1_question_00086_13.dita">
<topicsubject keyref="cfa_l1_los_567"/>
<difficulty value="intermediate"/>
</topicref>
Each map could have one or a hundred of these topicrefs. I'm trying to do a find that will give me each one of these as a separate result. I have tried a variety of regex, but each time, the result gives me the first <topicref all the way down to the last </topicref>, so it's not terribly helpful. This is the code I've been using:

Code: Select all

(topicref href=").*[">][\r\n][ ]*(<topicsubject keyref=").*["\/>][\r\n][ ]*(<difficulty value=").*[">][\r\n][ ]*(<\/topicref>)
Does anyone have any tips on how to have each <topicref> container return a result?

Thanks!
Peyton
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: RegEx to find text in a map

Post by Radu »

Hi Peyton,

By default regular expressions like ".*" are greedy, you can make them not greedy by appending a "?":

https://stackoverflow.com/questions/230 ... xpressions

Anyway, when it comes to the idea of searching for XML structure, you should try to use our XPath Builder view, you can change its scope to run on multiple files and run a simple XPath expression like this one:

Code: Select all

//topicref[topicsubject[@keyref]][difficulty[@value]]
Why is a regexp not a good idea when searching for XML content? Because you may have situations like this:

Code: Select all

<topicsubject keyref="cfa_l1_los_567"></topicsubject>[]

The tag is written in expanded form but it's equivalent to the collapsed form, it's difficult to express this with regexp.
Or you may have XML comments somewhere inside the topicref and again you need to express this with the regexp, leading to very ugly to understand expressions which may not find all cases.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
psbentley
Posts: 20
Joined: Mon Aug 04, 2014 5:18 pm

Re: RegEx to find text in a map

Post by psbentley »

That worked perfectly! Thanks!!
psbentley
Posts: 20
Joined: Mon Aug 04, 2014 5:18 pm

Re: RegEx to find text in a map

Post by psbentley »

Actually, it's not exactly what I needed (I just checked the XML doc). I see the topicref, but is there a way to pull that whole chunk? I really just need to know the file name and the difficulty value.
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: RegEx to find text in a map

Post by Radu »

Hi,

Two possible ways:

1) Run an XPath like this:

Code: Select all

//topicref[topicsubject[@keyref]][difficulty[@value]]/concat(@href, '  -  ', difficulty/@value)
Problem is that double clicking it will not take you to the place where the topicref is located.

2) Find/Replace in Files dialog, set the expression to find as:

Code: Select all

(.*)
Check the "Regular Expressions" and "Dot matches all" checkboxes.
Set the "Restrict to XPath" field to value:

Code: Select all

//topicref[topicsubject[@keyref]][difficulty[@value]]
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply