[XSL-LIST Mailing List Archive Home]
[xsl] outputting   to HTML (final answer)
Subject: [xsl] outputting   to HTML (final answer)|
From: Trevor Nash <tcn@xxxxxxxxxxxxx>
Date: Sat, 15 Dec 2001 13:35:27 +0000
Not likely, but lets give it a try...
In an attempt to reduce the number of 'how do I get ' questions,
I have tried to update Dave Pawson's FAQ on the subject: text follows.
I also sent a message to the list owners to see if we can get the
search mechanism tweaked to make it easier to find
I actually found it quite hard to locate definitive answers on the
subject which cover all the angles, partly because it has been
discussed so many times, and partly becuase some need to be edited for
I have paraphrased my recollections of what has been said about
dealing with badly configured / old browsers. I would welcome
pointers to actual messages off the list which I could quote instead,
and any improvements on the ones I have chosen.
Be harsh, I have a flame-proof suit in the loft ;-)
How to output   in HTML
[[ existing text from the nbsp topic ]]
> I'm generating HTML from XML
> The output HTML needs to contain some " ". But until now I could not
> find a way to implement that.
is by definition
Just put   (or  ) in your stylesheet to represent the
non-breaking space character in the stylesheet tree and result tree.
when the result tree is output, the character will be output as either
or assuming you have <xsl:output method="html"/> in the
> I tought the entity was predefined in xml.
It is not predefined. Only < > & " ' are
predefined. You can either use   or  , or you can define an
entity like nbsp for the same.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
(off list) Apparently one motivation for trying to get into
the output is to cope with browsers that either cannot handle the
encoding being used or have been set up incorrectly (the advice is to
set to 'auto detect' if this option is available).
> Another part of my problem was that a literal character #160 was
> mysteriously coming through not as a non-breaking space, but as a Â
> character, which is ANSI #194.
in an XML document always refers to UCS character code U+00A0.
This character must be encoded upon output in a document. If your
document is encoded as ISO-8859-1 or US-ASCII, the character will
manifest as the single byte A0 (in hex, or 160 in decimal). If your
document is encoded with UTF-8, it will be the pair of bytes C2 C0.
If you are looking at the UTF-8 encoded document in an editor or
shell/terminal window that doesn't know to interpret hex C2 C0 as a
UTF-8 sequence, then you'll probably see Â (the character in many
character sets/fonts at position hex C2, aka decimal 192) followed by
an invisible character (C0, which if interpreted as an ISO-8859-x
character happens to be invalid in HTML).
If you don't like the encoding your XSLT processor gives you normally,
you can use the encoding attribute on the xsl:output element to
specify a particular encoding (provided your processor knows how to
deal with it).
If you are having to deal with old browsers and/or misconfigured
clients which you do not have the power to change, then you might be
left with no choice other than getting into the output. There
is no nice way to do this (as I hope we have already established, the
standards are constructed such that it should not be necessary). But
if it has to be done, here are the choices, and their caveats:
Choose a processor such as Saxon which gives you additional control
over the serialisation: Saxon for example. Caveat: ties you to one
Use <xsl:text disable-output-escaping="yes">&nbsp;</xsl:text>,
possibly with the DTD subset trick described above to keep the
stylesheet readable. Caveat: disable-output-escaping doesn't have to
be honoured by the processor. Even if it seems to work, it can be
fragile because it may be ignored if you later decide to send the
ouput via a DOM, or you use variables and node-set() to store part of
See also [[ xref to http://www.dpawson.co.uk/xsl/sect2/N2215.html ]]
Use an element or processing instruction to represent the non-breaking
space, and substitute it with a custom serialiser. Caveat: hard work,
and ties you to a specific processor or class of processors.
[[ existing text from the nbsp topic ]]
[[ Trevor: I'm not sure this belongs here ]]
Wendell Piez outlines a use in tables with empty cells.
Outputting spaces in html table cells
Use   for a non-breaking space. Your XML parser does not pick up
the named entity because it hasn't been declared. But a
numbered character reference (which is what   is) will be
recognized -- #160 is a non-breaking space.
You can even declare nbsp in an internal subset of your stylesheet if
you want a friendlier representation of the character.
>There is some code before this that generates a table.
>if the value of "blah" is blank, and I was outputing this to html, then
>not handle blank <td/> fields in an elegant manner because it would shift
>the next column over one to replace the blank column. Normally, I would insert an ' '
>between each <td> tag so that netscape would render a space and not ignore the cell, but as
>you know, '&' is reserved in xml. I tried &, but that doesn't render a space but rather
>the real '&' symbol. So my question is what is the best way to solve this problem?
On the finer points of encodings and character references:
Mike Brown on browser character encodings
Traditional training & distance learning,
Consultancy by email
Melvaig Software Engineering Limited
voice: +44 (0) 1445 771 271
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list