[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] RE: Smart Quote Encoding


Subject: [xsl] RE: Smart Quote Encoding
From: "Roger L. Cauvin" <roger@xxxxxxxxxx>
Date: Thu, 13 Sep 2007 10:09:20 -0500

On Thu, September 13, 13:56:49 +1000 (EST), Deborah Pickett wrote:

> This error:
>
>>   Error
>>     org.xml.sax.SAXParseException: illegal XML character U+18:illegal
>> XML character U+18
>
> says that you have a character U+18 (i.e., ASCII CAN, decimal 24,
> Ctrl-X) in your file.  That character isn't allowed in XML.  See:
> http://www.w3.org/TR/REC-xml/#charsets
>
> Whatever is generating the "XML" file is putting that character in,
> erroneously.

I have a program that is receiving text-only e-mails and logging the
messages to XML.  For various reasons (including troubleshooting), I would
like to log the content of the e-mails exactly.  It sounds like that's
simply not possible in XML, at least to the extent that "text-only" can
include characters not allowed in XML.

> You will have to either tell the generator to not do that, or you
> will have to insert a pipeline stage that converts U+18 into some
> other character so that the document is actually XML and can be
> parsed.

I guess I have to go with the pipelining strategy.

> To add to the conformance woes of whatever is producing your input,
> U+18 is not a printable character in ISO 8859-1, nor are smart
> quotes part of true ISO 8859-1 (they are in Windows-1252), so if it
> is producing the XML declaration you quoted then it is doubly wrong.

I inserted the ISO 8859-1 encoding declaration myself.  Apparently, Saxon
6.3 doesn't support windows-1252 encoding.  Saxon 8.9J, which I just now
installed, does appear to support that encoding.  However, it still
(correctly) flags the U+18 character as illegal.

--
Roger L. Cauvin
Cauvin, Inc.
Product Management/Market Research


Current Thread
Keywords
xml