[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Re: XML/XHTML fragment to text


Subject: Re: [xsl] Re: XML/XHTML fragment to text
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Thu, 16 Aug 2007 16:42:52 +0200

Abel Braaksma wrote:

One attribute on xsl:output causes problems always, as far as I could tell, which is the following:


* byte-order-mark

When you use it together with UTF-8 it will offset the amount by one. This is because the byte order mark (xFEFF), when interpreted as a string, will be translated into the equivalent string representation in UTF-8, which is the byte sequence xEFBBBF, now representing the codepoint 65279 (U+FEFF) (Zero Width No Break Space, deprecated but allowed). This interpretation is in lieu of the Unicode recommendation. It is useless to put a BOM at the beginning of a UTF-8 stream, so it is best to avoid it.


Oh, I must be sleeping. The analysis above is correct, but the amount "offset by one" is also correct. A UTF-8 bytestream will never start with the bytes xFFEF or xFEFF. When the BOM is present in UTF-8, it is (and must be) encoded as xEFBBBF, meaning: the UTF-8 representation of U+FEFF. Ergo: the total amount (plus three for the BOM) is correct. Ergo: there are no mistakes in calculation when using the mentioned approach.

Sorry for cluttering this thread...

Cheers,
-- Abel Braaksma


Current Thread
Keywords