advanced find and replace issue
Posted: Sat Feb 26, 2011 4:33 pm
Dear friends, as an amateur to scripting I would be very thankful for any help with the following issue
I got a big word file of about 1100 pages in Chinese, a digitalized encyclopedia from around 1900AD. This file has been transformed into a xml file in a first step to make it fulltext searchable for a University project. During the conversion process wavy underlines under key terms have been lost. My task now is to search for these terms in the word file, localize the same term in the xml file and wrap it with a <title></title> tag. In order to save (a lot of) time I thought of automatizing this workflow.
I guess I need an external tool in order to do this i.e. search the word file for the term, find it in the xml file and wrap it with the title tag, or is there a way to do this with oxygen?
The tricky thing is this: Some terms appear several times in different contexts whilst they are only once or twice underlined. Therefore I need to formulate a query which takes into consideration some surrounding Chinese characters in order to identify exactly which one to wrap.
I would appreciate any help very much! Even if it is only a hint on how to get started with it.
I got a big word file of about 1100 pages in Chinese, a digitalized encyclopedia from around 1900AD. This file has been transformed into a xml file in a first step to make it fulltext searchable for a University project. During the conversion process wavy underlines under key terms have been lost. My task now is to search for these terms in the word file, localize the same term in the xml file and wrap it with a <title></title> tag. In order to save (a lot of) time I thought of automatizing this workflow.
I guess I need an external tool in order to do this i.e. search the word file for the term, find it in the xml file and wrap it with the title tag, or is there a way to do this with oxygen?
The tricky thing is this: Some terms appear several times in different contexts whilst they are only once or twice underlined. Therefore I need to formulate a query which takes into consideration some surrounding Chinese characters in order to identify exactly which one to wrap.
I would appreciate any help very much! Even if it is only a hint on how to get started with it.