Page 1 of 1

utf-8 to Windows-1252 encoding with XSL

Posted: Tue Feb 26, 2013 7:07 pm
by 6thChild
Hi everyone,

I have a XSL transformation which reads a XML file encoded in UTF-8 and writes a text file which must be encoded in Windows-1252.
So I wrote the following line in my transformation.
<xsl:output method="text" encoding="Windows-1252" />

Everything was working fine until I ran into an UTF-8 character which is absent in Windows-1252.
It creates a fatal error.

I can't find a way to specify that any character unavailable in Windows-1252 can be skipped and I have no idea how to solve this problem.
Any ideas?

Thanks for your help!

Guillaume

Re: utf-8 to Windows-1252 encoding with XSL

Posted: Mon Mar 04, 2013 4:19 pm
by adrian
Hello,

If it's for specific characters, you can use xsl:character-map:

Code: Select all

<xsl:character-map name="a">
<xsl:output-character character="<" string="&lt;"/>
<xsl:output-character character=">" string="&gt;"/>
</xsl:character-map>
<xsl:output method="text" use-character-maps="a"/>
But in this case you need to cover a large character range (U+0100 -> U+FFFD), so this example is better suited:
http://stackoverflow.com/questions/1079 ... iven-range
Use the range 256-65533 (65534/U+FFFE and 65535/U+FFFF are not allowed in XML).
e.g.

Code: Select all

<xsl:template match="text()">
<xsl:analyze-string select="." regex="[&#256;-&#65533;]">
<xsl:matching-substring>
<!-- Skip -->
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Regards,
Adrian

Later Edit:
Skipped the matching chars in code above.

Re: utf-8 to Windows-1252 encoding with XSL

Posted: Wed Mar 06, 2013 5:20 pm
by 6thChild
Hello,

it did work, thank you very much Adrian!

I wrote the following line to erase characters that weren't allowed in Windows-1252:
<xsl:value-of select="replace(., '[&#256;-&#8363;]|[&#8365;-&#65533;]', '')"/>
I left the UTF-8 € character &#8364;

Regards,
Guillaume