[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] unicode numeric character references in xml output


Subject: RE: [xsl] unicode numeric character references in xml output
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 30 May 2008 22:21:09 +0100

You can't distinguish characters that were written in the source document as
themselves from characters that were written as numeric character references
- the XML parser doesn't provide this information.

You can force all multi-byte characters to be output as character references
by specifying <xsl:output encoding="iso-8859-1"/>

I tend to think there's something a bit wrong with your system design if it
depends on getting this right. It shouldn't matter how characters are
represented, any more than in matters whether the input is on a local disk
or on the web - you need to get your architectural layering right.

Michael Kay
http://www.saxonica.com/ 


> -----Original Message-----
> From: a k laue [mailto:quiotl@xxxxxxxxx] 
> Sent: 30 May 2008 18:28
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] unicode numeric character references in xml output
> 
> Hello,
> 
> I'm transforming XML to XML, and I need to pass through the 
> unicode numeric character references (hex) from the source to 
> the output. That is, I need "&#x2019;" in the input to appear 
> as "&#x2019;" in the output. I'm using XSLT 2.0 and the Saxon 
> 9B processor.
> 
> Unfortunately, the set of possible character references is 
> large. (The transform works on a very large set of scientific 
> articles. These may include special characters in author 
> names, article titles,
> etc.) I originally looked to character maps as the solution, 
> but I don't see how to map the entire unicode set.
> 
> Thanks,
> Andrea


Current Thread
Keywords