[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: Special entity characters in Shift-JIS XSL.


Subject: Re: Special entity characters in Shift-JIS XSL.
From: "Nikolai Grigoriev" <grig@xxxxxxx>
Date: Fri, 17 Dec 1999 04:39:41 +0300

David Carlisle wrote:


>which spec? there is nothing that could be put into the xsl spec, as
>what you are asking for is a change in XML 1.0, this is why your
>suggested markup of using &# syntax will always be fragile and flaky.
>As soon as your documents are touched by any xml parser the characters
>may (or may not) be written out as character data in the document
>encoding rather than as character references, since the xml spec makes
>it explicit that these are equivalent when used in element character
>data.


I have much the same problems with Russian texts as Sean O'Dell
has with Shift-JIS. For Russian, there exist two major 8-bit encoding
schemes plus two minor ones; UTF-8 is scarcely used because it
doubles the length of the text. Surely enough, none of the 8-bit Russian
encoding is supported by currently available XSLT processors. Well,
I can change the encoding declaration to ISO-8859-1 and let the whole
text be parsed correctly. But outputting the processing results as UTF-8
is dramatic: what I get is "KOI8-r converted to UTF-8 as if it were
Latin-1",
too strong for poor me.

I admire James Clark's XT, but I can hardly use it for Russian - because
there's no way to make it output anything but UTF-8. Fortunately,
there is SAXON that supports Latin-1 in the output, and lets me pass
my weird letters through ;-); thanks to Mike Kay!

I think a universal solution would be a proper support for US-ASCII
output encoding. This would quote to numeric entities all characters
but the 7-bit ones - exactly what Sean need. This is often a
preferred solution for non-Latin-1 encodings that can hardly be
supported by common tools in the nearest future. It's a pity that XML
spec does not enforce this as a conformance criterion.

SAXON kinda does it: it issues a message that US-ASCII encoding
is not supported and threatens to switch to UTF-8, but still prints all
special characters as numeric entities. Thanks again Mike ;-).

Regards,
Nikolai



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords