[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Character Encoding Problem


Subject: Re: [xsl] Character Encoding Problem
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Sep 2014 15:47:06 -0000

The lower bound of #
   regex="[&#x100;-&#x10FFFF;]"
should be set to &#x80; ( or &#xC0; if you want to be finicky).
Cheers
-W


On 25 September 2014 17:12, Tony Graham tgraham@xxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, September 25, 2014 11:32 am, Tony Graham tgraham@xxxxxxxxxx wrote:
> > On Tue, September 23, 2014 9:32 pm, Michael Kay mike@xxxxxxxxxxxx wrote:
> >> On 23 Sep 2014, at 21:23, Craig Sampson craig.sampson@xxxxxxx
> >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>   We're trying to create a java properties file using XSLT 2.0 in
> >>> Saxon.
> >>> The input is XML encoded as UTF-8. The properties file needs to be
> >>> encoded as ISO-8859-1. The character giving the problem, in the input
> >>> file, is &#x201c; which is a left hand double quote. Looking at the
> >>> ISO-8859-1 character set the closest character appears to be a double
> >>> quote - with no hand (left/right).
> >
> > To move the goalposts,
>
> Since I inadvertently ended up repeating most of Wolfgang Laun's advice,
> let me try again with something more original:
>
> ----
> <xsl:stylesheet
>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>     xmlns:m="http://www.mentea.net/namespace"
>     version="2.0"
>     exclude-result-prefixes="m xs">
>
> <xsl:output method="text" encoding="ISO-8859-1" />
>
> <xsl:template match="text()">
>   <xsl:analyze-string select="."
>                       regex="[&#x100;-&#x10FFFF;]">
>     <xsl:matching-substring>
>       <xsl:value-of select="m:escape(.)" />
>     </xsl:matching-substring>
>     <xsl:non-matching-substring>
>       <xsl:value-of select="." />
>     </xsl:non-matching-substring>
>   </xsl:analyze-string>
> </xsl:template>
>
> <xsl:function name="m:escape" as="xs:string">
>   <xsl:param name="char" as="xs:string" />
>
>   <xsl:variable name="hex-chars"
>                 select="m:to-hex(string-to-codepoints($char))"
>                 as="xs:string+" />
>
>   <xsl:sequence
>       select="string-join(('\u',
>                            substring('000', count($hex-chars)),
>                            $hex-chars),
>                           '')" />
> </xsl:function>
>
> <xsl:function name="m:to-hex" as="xs:string+">
>   <xsl:param name="codepoint" as="xs:decimal" />
>
>   <xsl:sequence
>       select="if ($codepoint >= 16)
>                 then m:to-hex(floor($codepoint div 16))
>               else ()" />
>
>    <xsl:sequence select="substring('0123456789ABCDEF',
>                                    ($codepoint mod 16) + 1, 1)" />
> </xsl:function>
>
> </xsl:stylesheet>
> ----
>
> (though it does borrow from and correct
> http://www.biglist.com/lists/xsl-list/archives/200012/msg00426.html).
>
> Regards,
>
>
> Tony Graham                                         tgraham@xxxxxxxxxx
> Consultant                                       http://www.mentea.net
> Chair, Print and Page Layout Community Group @ W3C    XML Guild member
>   --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
> Mentea       XML, XSL-FO and XSLT consulting, training and programming


Current Thread
Keywords