[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] special character encoding, two problems

Subject: Re: [xsl] special character encoding, two problems
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Oct 2014 14:17:20 -0000

By "is otherwise a Unicode document" I really meant "is otherwise a
Unicode byte sequence as serialized". Of course all XML documents consist
of Unicode characters by definition.

In the case where the encoding of the bytes is a Unicode encoding, numeric
character references are *never* necessary. In the case where the encoding
of the bytes is not a Unicode encoding and there is not one-to-one
correspondence between characters in the encoding used and Unicode
characters, then numeric character references will be required for any
such characters, e.g., using some form of ASCII as the encoding.

But in that case, it still shouldn't matter (and certainly won't to any
conforming XML parser) what the details of the numeric character
references are: 2-digit hex, 4-digit hex, or decimal values.

If you know that non-XML-aware tools may be operating on your XML byte
sequences, essentially treating what should be consumed as UTF-8 as ASCII,
then it is probably necessary to use numeric character references, but in
that case you should probably just use the appropriate non-Unicode
encoding, because then everything is clear.


Eliot Kimber, Owner
Contrext, LLC

On 10/16/14, 5:56 AM, "Wolfgang Laun wolfgang.laun@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>On 15 October 2014 21:23, Eliot Kimber ekimber@xxxxxxxxxxxx
><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>It also shouldn't matter how the characters are encoded if the XML
>document is otherwise a Unicode document (e.g., encoded in UTF-8 or
>XML requires Unicode (neither "if" nor "otherwise"). The encoding is not
>necessarily one of the UTF-*-encodings, which, I'm sure, the "e.g." was
>not meant to imply.
>I also wonder about the requirement to have numeric entities in hex.
>XSL-List info and archive
>(by email <>)

Current Thread