[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Recognized Unicode characters?


Subject: Re: [xsl] Recognized Unicode characters?
From: "Edward Bryant" <bryant_edward@xxxxxxxxxxx>
Date: Mon, 09 May 2005 09:27:21 -0500

From: David Carlisle <davidc@xxxxxxxxx>

That's not how it works. In XML a character reference & # 1 2 3 ;
_always_ refers to a unicode character number, irrespective of the
encoding. The encoding tells the system what characters the actual bytes in the
file mean.

Thanks, that clarification helped.


b) whether they do or not, they should declare the encoding that is
actually used in the file by adding a < meta> element with an http-equiv
that specifies the encoding used.

My output is to "charset=utf-8"


If your output is using utf-8 and the character is output as a character
in that encoding (rather than a character reference & # ... or an entity
reference & m d a s...) Then it will work so long as your browser is set
up to view in utf8. this may or may not be automatic depending on
browser settings, see the view/encoding menu option in IE6.

I tried that, it was set to "Western European (Windows)" but when I switched the view to Unicode, I still get the box in place of the em-dash. So, that would mean that the output is not unicode, right?


In the XML output method a character that is not in the encoding will be
output using a character refernce. UTF8 encodise all of unicode so if
you output in that encoding you would not expect to see character
references in the output. If on the other hand you output to encoding
US-ASCII then only ascii characters can be output directly so any
non-ascii character will be output using a character reference.
The advantage here is that the file itself is then just ascii encoded so
will work on browsers which don't have encoding support correctly set
up. The disavantage is that if any non-ascii character is used in a
place where you can not represent it by a character reference (for
example if an element name  or the content of a comment,uses such a
character then you will get no output and a fatal error that your result
can not be produced in the specified encoding.

Thanks, that explanation helped make the previous posts make sense to me now. But when I tried to set the output to US-ASCII, I got an error message that said that the output method must be xml, html or text. So, how do I set the output to US-ASCII.



Current Thread
Keywords