Convert ASCII to UTF char

Questions about XML that are not covered by the other forums should go here.
gardefjord
Posts: 3
Joined: Mon Nov 28, 2011 6:36 pm

Convert ASCII to UTF char

Post by gardefjord »

Hi all,
I have html-files with a bunch of ASCII-signs inside. Like so:

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv">
<head>
<title>De andra</title>
<link rel="stylesheet" href="Styles.css" type="text/css" />
<link rel="stylesheet" type="application/vnd.adobe-page-template+xml" href="page-template.xpgt" />
</head>
<body>
<div class="booksection">
<h1 id="ch001"><a id="page_011"></a>Molly Beslutet</h1>
<p class="noindent_j1">N&#x00E4;r Molly vaknade str&#x00E4;ckte hon ut ena armen mot den andra kudden. Den var lika tom som den varit det senaste halv&#x00E5;ret. Ingen kind att smeka, ingen kropp att krypa intill. Pelle fanns helt enkelt inte d&#x00E4;r.</p>
<p class="indent_j">Hon satte sig upp och sl&#x00E4;ppte ner f&#x00F6;tterna i f&#x00E5;rskinnsf&#x00E4;llen. Den mjuka, lockiga k&#x00E4;nslan fick hennes kropp att l&#x00E5;ngsamt vakna. Hon tog ett par steg fram till f&#x00F6;nstret, &#x00F6;ppnade det och drog f&#x00F6;rsiktigt in den kalla luften i lungorna. &#x00C4;ven om vintern h&#x00F6;ll p&#x00E5; att sl&#x00E4;ppa sitt grepp och det mesta av sn&#x00F6;n hade sm&#x00E4;lt undan var morgnarna fortfarande svartm&#x00E5;lade. Molly huttrade och drog igen f&#x00F6;nstret.</p>
<p class="indent_j">I k&#x00F6;ket sl&#x00E4;ngde hon n&#x00E5;gra vedklampar i spisen och kaminen. Det k&#x00E4;ndes som om hon inte hade gjort n&#x00E5;got annat den sista tiden &#x00E4;n huggit ved och eldat upp den igen.</p>
If anyone knows how i can switch all the ASCII to normal UTF?
&#x00E5;l = å
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: Convert ASCII to UTF char

Post by Radu »

Hi Adam,

Just open the document in the Oxygen Text page, select all, right click, choose Source->Unescape selection and check the Unescape Characters check box.
Or if you want to automate this you could apply an XSLT stylesheet on it which just copies the XML content to the output.
An XSLT stylesheet which just copies the XML content has the following content:

Code: Select all

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<!-- Match document -->
<xsl:template match="/">
<xsl:apply-templates mode="copy" select="."/>
</xsl:template>
<!-- Deep copy template -->
<xsl:template match="*|text()|@*" mode="copy">
<xsl:copy>
<xsl:apply-templates mode="copy" select="@*"/>
<xsl:apply-templates mode="copy"/>
</xsl:copy>
</xsl:template>
<!-- Handle default matching -->
<xsl:template match="*"/>
</xsl:stylesheet>
and it automatically expands character entities to characters.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
gardefjord
Posts: 3
Joined: Mon Nov 28, 2011 6:36 pm

Re: Convert ASCII to UTF char

Post by gardefjord »

Thanks Radu!
Loving Oxygen XML Editor so far...
One more question though, how do I handle "&" signs?
Example:

Code: Select all

<p class="indent_j">– Jag skall ta strid mot dig var du än dyker upp, Ragnar & Surteson, viskade hon och spegelbilden upprepade löftet or dagrant.</p>
I get this error:

Code: Select all

F [Xerces] The entity name must immediately follow the '&' in the entity reference.
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: Convert ASCII to UTF char

Post by Radu »

Hi Adam,

According to the XML standard specification the "&" character is not allowed in XML content and should always be escaped to &.
When unescaping the entire Text content the Unescape selection dialog has a checkbox which can be unchecked to leave &'s untouched.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
gardefjord
Posts: 3
Joined: Mon Nov 28, 2011 6:36 pm

Re: Convert ASCII to UTF char

Post by gardefjord »

Thank you very much! Problem totally solved.
Post Reply