[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
RE: [xsl] nbsp is not that hard, folks
Subject: RE: [xsl] nbsp is not that hard, folks From: "Américo Albuquerque" <aalbuquerque@xxxxxxxxxxxxxxxx> Date: Sat, 9 Nov 2002 11:53:14 -0000 |
Hi there. So, what you are saying is that is to XML and HTML has "#define nbsp" is to C?? -----Original Message----- From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Mike Brown Sent: Friday, November 08, 2002 7:13 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] nbsp is not that hard, folks Brian Grainger wrote: > If you're trying to escape in a document encoded as UTF-8, you > have to use Unicode escaping of the UTF-8 representation of the > entity. In this case, is equal to  , and   encoded as > UTF-8 is \u00A0. Good grief. No, you have your terminology badly mixed up, and you're throwing in an irrelevant notation. " " " " and "\u00A0" have nothing, NOTHING to do with UTF-8. There is something about nbsp that just confuses the heck out of people. I think it must be the fact that it looks like a space, and that you don't have an nbsp key on your keyboard. OK, read this. 1. There is a character -- an abstract unit in a "script" (a writing system; we are using Latin right now) -- called NO-BREAK SPACE by the Unicode Standard and ISO/IEC 10646. Unicode and ISO/IEC 10646 assign this character an integer number, 160, which is A0 in hex. We say Unicode all the time around here, but we mean ISO/IEC 10646 because that's what the XML and HTML specs reference. The two standards share the same character repertoire and numbering so there's no harm. 2. UTF-8 is an encoding scheme that provides a way of representing any of the approximately 1.1 million possible abstract characters in Unicode as a sequence of 1 to 4 bytes. The UTF-8 representation of the Unicode character 160 (no-break space), is the pair of bytes C2 A0, in that order. In contrast, iso-8859-1 is a character map that provides a way of representing the first 256 Unicode characters as a single byte. us-ascii is an even more limited set of just the first 128, mapped to a single byte. 3. This thing: \u00A0 - is a sequence of 6 bytes (ASCII bytes for slash, u, zero, zero, A, zero); - has special meaning in a programming language like Java or Python, where it is essentially a macro for the no-break space character; - is used when representing the character directly as encoded bytes is impractical or impossible. 4. This thing:   or this thing:   - is to SGML applications like HTML and XML what \u00A0 is to Java & Python; - is called a character reference (or "numeric character reference"). 5. This thing: - is to SGML applications like HTML and XML an "entity reference"; - refers to an entity (a separate collection of information) named nbsp; - depending on the circumstances, is intended to be treated by the XML parser or HTML user agent as equivalent to the entity's "replacement text"; - is, in HTML, predefined to have the replacement text of just one character, the no-break space; - is not defined by default in XML. 6. The thing here in between the quotes: " " - is byte 0xA0; - is intended to be a no-break space because this email is iso-8859-1 encoded; - has exactly the same meaning in an XML document as  . - Mike ________________________________________________________________________ ____ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] nbsp is not that hard, fo, J.Pietschmann | Thread | Re: [xsl] nbsp is not that hard, fo, Mike Brown |
Re: [xsl] Generating schema target , Trevor Nash | Date | Re: [xsl] nbsp is not that hard, fo, Mike Brown |
Month |