advanced find and replace issue

Questions about XML that are not covered by the other forums should go here.
ziborium
Posts: 1
Joined: Sat Feb 26, 2011 4:20 pm

advanced find and replace issue

Post by ziborium »

Dear friends, as an amateur to scripting I would be very thankful for any help with the following issue

I got a big word file of about 1100 pages in Chinese, a digitalized encyclopedia from around 1900AD. This file has been transformed into a xml file in a first step to make it fulltext searchable for a University project. During the conversion process wavy underlines under key terms have been lost. My task now is to search for these terms in the word file, localize the same term in the xml file and wrap it with a <title></title> tag. In order to save (a lot of) time I thought of automatizing this workflow.
I guess I need an external tool in order to do this i.e. search the word file for the term, find it in the xml file and wrap it with the title tag, or is there a way to do this with oxygen?

The tricky thing is this: Some terms appear several times in different contexts whilst they are only once or twice underlined. Therefore I need to formulate a query which takes into consideration some surrounding Chinese characters in order to identify exactly which one to wrap.

I would appreciate any help very much! Even if it is only a hint on how to get started with it.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: advanced find and replace issue

Post by sorin_ristache »

Hello,

Just open the XML file in Oxygen and use the Find/Replace dialog box to locate all occurrences of the terms and surround them with the title element tags. The Text to find and Replace with boxes allow you to enter Unicode text so you can use Chinese characters. For example if you want to locate all occurrences of termToReplace with termToReplace you have to type termToReplace in the Text to find and <title>termToReplace</title> in the Replace with box.

You deal with the tricky part by formulating an XPath expression that restricts the scope of the find/replace to the context(s) where you want to do the surrounding. The XPath expression will have to match only the XML elements or attributes of the desired context(s) and will be typed in the XPath box of the Find/Replace dialog. For example if termToReplace appears in the elements para1, para2 and in the subpara3 child elements of the para3 elements but you want to surround it with title tags only in the para2 and subpara3 elements then you will type in the XPath box:

Code: Select all

para2 | para3/subpara3

Regards,
Sorin
Post Reply