advanced find and replace issue
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 1
- Joined: Sat Feb 26, 2011 4:20 pm
advanced find and replace issue
Dear friends, as an amateur to scripting I would be very thankful for any help with the following issue
I got a big word file of about 1100 pages in Chinese, a digitalized encyclopedia from around 1900AD. This file has been transformed into a xml file in a first step to make it fulltext searchable for a University project. During the conversion process wavy underlines under key terms have been lost. My task now is to search for these terms in the word file, localize the same term in the xml file and wrap it with a <title></title> tag. In order to save (a lot of) time I thought of automatizing this workflow.
I guess I need an external tool in order to do this i.e. search the word file for the term, find it in the xml file and wrap it with the title tag, or is there a way to do this with oxygen?
The tricky thing is this: Some terms appear several times in different contexts whilst they are only once or twice underlined. Therefore I need to formulate a query which takes into consideration some surrounding Chinese characters in order to identify exactly which one to wrap.
I would appreciate any help very much! Even if it is only a hint on how to get started with it.
I got a big word file of about 1100 pages in Chinese, a digitalized encyclopedia from around 1900AD. This file has been transformed into a xml file in a first step to make it fulltext searchable for a University project. During the conversion process wavy underlines under key terms have been lost. My task now is to search for these terms in the word file, localize the same term in the xml file and wrap it with a <title></title> tag. In order to save (a lot of) time I thought of automatizing this workflow.
I guess I need an external tool in order to do this i.e. search the word file for the term, find it in the xml file and wrap it with the title tag, or is there a way to do this with oxygen?
The tricky thing is this: Some terms appear several times in different contexts whilst they are only once or twice underlined. Therefore I need to formulate a query which takes into consideration some surrounding Chinese characters in order to identify exactly which one to wrap.
I would appreciate any help very much! Even if it is only a hint on how to get started with it.
-
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: advanced find and replace issue
Post by sorin_ristache »
Hello,
Just open the XML file in Oxygen and use the Find/Replace dialog box to locate all occurrences of the terms and surround them with the title element tags. The Text to find and Replace with boxes allow you to enter Unicode text so you can use Chinese characters. For example if you want to locate all occurrences of termToReplace with termToReplace you have to type termToReplace in the Text to find and <title>termToReplace</title> in the Replace with box.
You deal with the tricky part by formulating an XPath expression that restricts the scope of the find/replace to the context(s) where you want to do the surrounding. The XPath expression will have to match only the XML elements or attributes of the desired context(s) and will be typed in the XPath box of the Find/Replace dialog. For example if termToReplace appears in the elements para1, para2 and in the subpara3 child elements of the para3 elements but you want to surround it with title tags only in the para2 and subpara3 elements then you will type in the XPath box:
Regards,
Sorin
Just open the XML file in Oxygen and use the Find/Replace dialog box to locate all occurrences of the terms and surround them with the title element tags. The Text to find and Replace with boxes allow you to enter Unicode text so you can use Chinese characters. For example if you want to locate all occurrences of termToReplace with termToReplace you have to type termToReplace in the Text to find and <title>termToReplace</title> in the Replace with box.
You deal with the tricky part by formulating an XPath expression that restricts the scope of the find/replace to the context(s) where you want to do the surrounding. The XPath expression will have to match only the XML elements or attributes of the desired context(s) and will be typed in the XPath box of the Find/Replace dialog. For example if termToReplace appears in the elements para1, para2 and in the subpara3 child elements of the para3 elements but you want to surround it with title tags only in the para2 and subpara3 elements then you will type in the XPath box:
Code: Select all
para2 | para3/subpara3
Regards,
Sorin
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service