[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Missing byte-order mark problem


Subject: Re: [xsl] Missing byte-order mark problem
From: Mike Brown <mike@xxxxxxxx>
Date: Sun, 3 Aug 2003 16:30:58 -0600 (MDT)

Vivek Shinde wrote:
> For last two days I was struggling with a problem of applying a 
> XSL stylesheet to XML that had Danish characters (using entities 
> like &#248; etc.). The output=HTML was working fine but when I 
> tried to get text output I kept getting "Missing byte-order mark".
> I tried it with encoding of UTF-8 as well as UTF-16, it did not work.
>  Finally I found a listing on google from this group from way back
> in 2002 http://www.xslt.com/xsl-list/2002-02/msg00675.html and it 
> suggested to use encoding="iso-8859-1" and walla...it worked.

Trial and error is not a very good way to go about document authoring
or XSLT programming.

In the prolog of an XML document, encoding="iso-8859-1" is an assertion that
the document's bytes map to Unicode characters according to the iso-8859-1
encoding. This declaration may be entirely false, as you may have saved the
document in UTF-8 or UTF-16 or some other format. It is required to be a
truthful statement, though, by the XML spec, so that an XML parser will know
how to interpret the bytes.

In the xsl:output instruction element, encoding="iso-8859-1" is there to
notify the XSLT processor that after it is done building the result tree,
you would like it to be serialized as bytes according to the iso-8859-1
encoding.

"Missing byte-order mark" indicates that your XML parser is trying to read a
document under the assumption that it is utf-16 encoded (1 or 2 pairs of bytes
per character, plus a 2-byte sequence at the beginning of the document to
indicate whether the low or high byte comes first in each pair), but are in
fact feeding it a document that is iso-8859-1 or windows-1252 (or any other
non-BOM-using encoding) encoded.

Most likely the cause of this is that your XML prolog contains an
encoding="utf-16" declaration (or you've somehow told the XML parser
externally that the document is utf-16), when in fact the document is actually
iso-8859-1 or windows-1252 encoded.

-Mike

PS- It's "voilà" -- http://www.bartleby.com/61/81/V0138100.html

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords