Convert Diacritical Unicode Character and Punctuation Codes
Posted: Sat Nov 27, 2010 7:28 am
I have a TEI Tite XML file containing a French text with thousands of diacritical characters. Though the document header declares it is encoded in UTF-8...
... the codes for diacritical characters and punctuation are in a different format, i.e.:
I am trying to use the PhiloLogic text mining tool to analyze this text, but it won't find diacritical characters or punctuation unless they're in UTF-8.
How do I convert only the diacritical character and punctuation codes, as above, to UTF-8 codes?
Thanks,
Jeff
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
Code: Select all
s’est écoulé jusqu’à son réveil; mais leurs rangs peuvent se mêler
How do I convert only the diacritical character and punctuation codes, as above, to UTF-8 codes?
Thanks,
Jeff