[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Character substitution


Subject: Re: [xsl] Character substitution
From: Jim Fuller <jim.fuller@xxxxxxxxxxxxxx>
Date: Mon, 10 Jan 2005 12:31:46 -0500

Your input has a reference to unicode 128. that is a control character
(on the meaning of which you explitly shouldn't depend).

I would call it a ANSI character number 128 (using windows-1252 is just too new fangled for me) with its assoc unicode number being 8364


> Now if you output to ISO-8859-1 then probably you should get the same
control character (ie byte 128). If your browser happens to decide to be
non conformant but friendly and show that as a euro, that's either good
or bad, depending on your point of view. If you output to Windows-1252
then that doesn't have those control characters (as the space is taken

isnt ANSI what you wordy folks call windows-1252 (once again for me this was once known as CP1252)? doesnt ANSI define ANSI char 128 as the euro?


It is my (feeble) understanding that ANSI characters 32 to 127 correspond to those in the 7-bit ASCII character set, which are just the basic latin unicode character range. The next set of characters, e.g. 160255, correspond to those in the latin-1 supplement unicode character range; positions 128159 in Latin-1 Supplement being reserved for controls. Though I might be mistaken, are not most of these used for printable characters in ANSI, of which 128 is the euro?

OK, I understand the anachronism now...which is why I have probably carried this with me for so long....and why I always started using
&#8364; with UNIX based systems.


up with extra printing symbols) so you should get a fatal encoding error

btw the same forgiveness occurs when using &#8364; ....it renders into euro symbol in Mozilla when char encoding is ISO-8859-1...I must admit that I find it difficult to determine the default behavior.


telling you that you can't linearise character 128 into the windows
encoding (as that slot is taken up to linearise character 8364).  If
however the encoding support silently lineraises both 128 and 8364 on to
the same slot (so destroying the round tripping that is supposed to be
preserved by linearisation) you will see a euro, but whether that is
good or bad depends on your point of view...

not quite sure what my view is on this, with all the trickery that makes my computer able to display characters with multiple encodings managing all the backword compatibility issues, etc... it seems to be ball of twine and knots which just makes things work....and very little logical sense;


cheers, Jim Fuller


Current Thread
Keywords