[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xsl] Recognized Unicode characters?
Subject: Re: [xsl] Recognized Unicode characters? From: David Carlisle <davidc@xxxxxxxxx> Date: Mon, 9 May 2005 14:52:12 +0100 |
I set output to HTML because that is the output I am creating. (isn't this right?) yes (it's the default anyway if the top level result element is html in no namespace) but setting it doesn't do any harm) As I understand it, shouldn't the XSLT processor know from the "encoding" attribute that the references will be to Unicode numbers and read them correctly as those characters. That's not how it works. In XML a character reference & # 1 2 3 ; _always_ refers to a unicode character number, irrespective of the encoding. the encoding tells the system what characters the actual bytes in the file mean, so for example if your file has "abc" it doesn't really have the letters a b ac it has bytes with values 97 98 and 99. In order to know that 97 is a the system needs to know what encoding the file is in. 97=a is the ASCII encoding of a and many encoings are compatible with this, o there is a tendency to think that this is some universal law, but it's not the case, and XML doesn't assume ASCII compatible encodings, and in fact it mandates teh support for one non-ascii compatible encoding, utf16, when an a would be encoded with two bytes one with value 0 and one with value 97. So, I am still confused why a Unicode reference to #8212 won't output correctly? The ouput displays a square box in both the browser (IE6) as well as in the HTML source itself (viewed via Windows notepad). Your stylesheet processor has some leeway into what encoding it uses, it can ignore the hints in xsl:output so the important thing is what did it actually use, not just what did you ask for. If you are outputting via the html method then a) many processors will output this character as & m d a s h ; b) whether they do or not, they should declare the encoding that is actually used in the file by adding a < meta> element with an http-equiv that specifies the encoding used. If your output is using utf-8 and the character is output as a character in that encoding (rather than a character reference & # ... or an entity reference & m d a s...) Then it will work so long as your browser is set up to view in utf8. this may or may not be automatic depending on browser settings, see the view/encoding menu option in IE6. In the XML output method a character that is not in the encoding will be output using a character refernce. UTF8 encodise all of unicode so if you output in that encoding you would not expect to see character references in the output. If on the other hand you output to encoding US-ASCII then only ascii characters can be output directly so any non-ascii character will be output using a character reference. The advantage here is that the file itself is then just ascii encoded so will work on browsers which don't have encoding support correctly set up. The disavantage is that if any non-ascii character is used in a place where you can not represent it by a character reference (for example if an element name or the content of a comment,uses such a character then you will get no output and a fatal error that your result can not be produced in the specified encoding. David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Recognized Unicode charac, Geert Josten | Thread | Re: [xsl] Recognized Unicode charac, Edward Bryant |
Re: [xsl] Recognized Unicode charac, Geert Josten | Date | Re: [xsl] Do Templates Conflict?, David Carlisle |
Month |