[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Combining use-character-maps and normalization-form="NFC" attributes produce unwanted output


Subject: Re: [xsl] Combining use-character-maps and normalization-form="NFC" attributes produce unwanted output
From: "Wolfgang Laun wolfgang.laun@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Feb 2016 15:41:55 -0000

Even the solitary identity transformation of the semicolon 0x3B
     <xsl:output-character character=";" string=";"/>
results in a translation to U+037E of all semicolons. Seems to be a bug.

 SaxonHE 9.6.0.1


On 12 February 2016 at 15:29, lancelot.meurillon@xxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> XSL processor : Saxon-EE 9.5.1.8J from Saxonica
>
> XSL version : 2.0
>
>
>
> Dear all,
>
>
>
> For some reasons, I need to escape specific characters in the output and
> also need to produce normalised Unicode in NFC.
>
> Here is my input :
>
> <inputText>b; ;</ inputText >  => which is \u201D + \u003B + \u0020 +
> \u003B
>
>
>
> Here is the output properties of my stylesheet :
>
> <xsl:output method="xml" version="1.0" encoding="UTF-8"
>
>         indent="yes" omit-xml-declaration="no"
>
>         use-character-maps="unsupported_characters"
>
>         normalization-form="NFC"
>
>     />
>
>
>
> The character-map definition :
>
> <xsl:character-map name="unsupported_characters">
>
>         <xsl:output-character character="&#8220;" string="&quot;"/>
>
>         <xsl:output-character character="&#8221;" string="&quot;"/>
>
>     </xsl:character-map>
>
>
>
> With this template :
>
> <xsl:template match="/ ">
>
>     <shortDescription><xsl:value-of select=" inputText
> "/></shortDescription>
>
> </xsl:template>
>
>
>
> Now the output :
>
> <shortDescription>"M> ;</shortDescription> => which is \u0022 + \u037E +
> \u0020 + \u003B
>
>
>
> Why the semicolon (\u003B) is translated into Greek question mark (\u037E)
> just after the escaped quote while the next semi colon is kept ?
>
> But the right question is why my semicolon is escaped into Greek question
> mark ?
>
>
>
> Just to go further :
>
> 1- If I do not use character-map the result is :
>
> <shortDescription>b; ;</shortDescription> => which is \u201D + \u003B +
> \u0020 + \u003B
>
>
>
> 2- If I do not normalize the Unicode (without normalization-form="NFC"
> attribute)
>
> <shortDescription>"; ;</shortDescription> => which is \u0022 + \u003B +
> \u0020 + \u003B
>
>
>
> Thanks for the help
>
> Lancelot


Current Thread
Keywords