[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Character Encoding Problem


Subject: Re: [xsl] Character Encoding Problem
From: "Tony Graham tgraham@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Sep 2014 15:11:32 -0000

On Thu, September 25, 2014 11:32 am, Tony Graham tgraham@xxxxxxxxxx wrote:
> On Tue, September 23, 2014 9:32 pm, Michael Kay mike@xxxxxxxxxxxx wrote:
>> On 23 Sep 2014, at 21:23, Craig Sampson craig.sampson@xxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>   Were trying to create a java properties file using XSLT 2.0 in
>>> Saxon.
>>> The input is XML encoded as UTF-8. The properties file needs to be
>>> encoded as ISO-8859-1. The character giving the problem, in the input
>>> file, is &#x201c; which is a left hand double quote. Looking at the
>>> ISO-8859-1 character set the closest character appears to be a double
>>> quote  with no hand (left/right).
>
> To move the goalposts,

Since I inadvertently ended up repeating most of Wolfgang Laun's advice,
let me try again with something more original:

----
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:m="http://www.mentea.net/namespace"
    version="2.0"
    exclude-result-prefixes="m xs">

<xsl:output method="text" encoding="ISO-8859-1" />

<xsl:template match="text()">
  <xsl:analyze-string select="."
		      regex="[&#x100;-&#x10FFFF;]">
    <xsl:matching-substring>
      <xsl:value-of select="m:escape(.)" />
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

<xsl:function name="m:escape" as="xs:string">
  <xsl:param name="char" as="xs:string" />

  <xsl:variable name="hex-chars"
		select="m:to-hex(string-to-codepoints($char))"
		as="xs:string+" />

  <xsl:sequence
      select="string-join(('\u',
                           substring('000', count($hex-chars)),
			   $hex-chars),
			  '')" />
</xsl:function>

<xsl:function name="m:to-hex" as="xs:string+">
  <xsl:param name="codepoint" as="xs:decimal" />

  <xsl:sequence
      select="if ($codepoint >= 16)
                then m:to-hex(floor($codepoint div 16))
              else ()" />

   <xsl:sequence select="substring('0123456789ABCDEF',
                                   ($codepoint mod 16) + 1, 1)" />
</xsl:function>

</xsl:stylesheet>
----

(though it does borrow from and correct
http://www.biglist.com/lists/xsl-list/archives/200012/msg00426.html).

Regards,


Tony Graham                                         tgraham@xxxxxxxxxx
Consultant                                       http://www.mentea.net
Chair, Print and Page Layout Community Group @ W3C    XML Guild member
  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
Mentea       XML, XSL-FO and XSLT consulting, training and programming


Current Thread
Keywords