[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Confused about entities


Subject: RE: [xsl] Confused about entities
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 14 Mar 2006 14:36:06 -0000

> I've been trying to search for a solution, in XSLT, to convert HTML
> encoded entities (such as &nbsp;) into the unicode entity (such as
> &#A0;) 

XSLT is a language for converting trees into trees, and these trees don't
include entity or character references; therefore XSLT isn't going to be
quite the right tool for the job. Entities are expanded by the parser while
constructing the source tree, and characters may be converted back into
entities (or more likely, character references) by the serializer when
"deconstructing" the result tree. If you want any control over handling of
entities, you therefore need to look to the parser and the serializer, not
to XSLT proper.

Essentially there's a logical view of an XML document and a physical view.
Applications are concerned only with the logical view - they aren't supposed
to care whether a NBSP character is represented as itself, as &nbsp; , as
&#xa0, or as &#0000000160; (any more than they care about whether attributes
are separated by spaces or by newlines). XSLT too is concerned only with the
logical view. If you want to manipulate the physical representation of the
document, you need a lower-level tool.

Michael Kay
http://www.saxonica.com/


Current Thread
Keywords