Convert Diacritical Unicode Character and Punctuation Codes
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 5
- Joined: Mon Aug 24, 2009 6:45 pm
Convert Diacritical Unicode Character and Punctuation Codes
I have a TEI Tite XML file containing a French text with thousands of diacritical characters. Though the document header declares it is encoded in UTF-8...
... the codes for diacritical characters and punctuation are in a different format, i.e.:
I am trying to use the PhiloLogic text mining tool to analyze this text, but it won't find diacritical characters or punctuation unless they're in UTF-8.
How do I convert only the diacritical character and punctuation codes, as above, to UTF-8 codes?
Thanks,
Jeff
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
Code: Select all
s’est écoulé jusqu’à son réveil; mais leurs rangs peuvent se mêler
How do I convert only the diacritical character and punctuation codes, as above, to UTF-8 codes?
Thanks,
Jeff
-
- Posts: 5
- Joined: Mon Aug 24, 2009 6:45 pm
Re: Convert Diacritical Unicode Character and Punctuation Codes
Sorry, forgot to say I've been trying to use oXygen to change those codes but just can't seem to find anything about it in the help system. Hoping someone here can point me in the right direction.
Best,
Jeff
Best,
Jeff
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Convert Diacritical Unicode Character and Punctuation Codes
Hi Jeff,
First of all, having characters which are escaped as character entities is perfectly legal in XML documents and if a tool does not handle them properly then it is not 100% XML conformant.
You can select the entire XML file content in Oxygen, right click, in the contextual menu go to Source->Unescape selection. Uncheck all the checkboxes and then check only the "Unescape Characters" checkbox. This should do what you want to accomplish.
Regards,
Radu
First of all, having characters which are escaped as character entities is perfectly legal in XML documents and if a tool does not handle them properly then it is not 100% XML conformant.
You can select the entire XML file content in Oxygen, right click, in the contextual menu go to Source->Unescape selection. Uncheck all the checkboxes and then check only the "Unescape Characters" checkbox. This should do what you want to accomplish.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service