[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xsl] XML apparently cannot be used for general text markup: whitespace gripe
Subject: Re: [xsl] XML apparently cannot be used for general text markup: whitespace gripe From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx> Date: Tue, 19 Mar 2002 10:37:21 -0500 |
[Chad Jones] > > I've noticed a lot of xml-derived web pages out there have screwed up > whitespace (words crammed together or an incorrect space before ending > punctuation). > > My conclusion is that blocks straight text (such as paragraphs) cannot be > further marked up with XML without screwing up spacing. > > For example, can anyone get this simple document into HTML without either > removing required spaces or adding inappropriate spaces? > > <?xml version="1.0"?> > <book> > <par> > Is his name really <first>John</first> <last>Doe</last>? > </par> > </book> > You have to distinguish between several different cases. 1) What you see in a browser. Normally (except text in special elements like <pre>) a browser collapses multiple whitespace character sequences down to a single space. The spaces present in the source file display as single spaces. 2) What the xml parser does by default (or by instruction). This affects the whitespace that is passed to the stylesheet processor, and specifically whitespace-only nodes. If whitespace-only nodes are removed, you could get the run-together words you have seen. Microsoft's msxml3 processor (to name one) removes such nodes by default. If you are using it in such a way that you can't tell it to preserve the whitespace-only nodes, you can get the same effect by including an xml:space='preserve' attribute in the root element of the xml file. Then your spaces will remain. 3) What the xslt processor does. This is controlled by xsl:preserve-space or xsl:strip-space elements, which also operate on whitespace-only nodes. By default the whitespace-only nodes are preserved. The result is controlled by the default or instructed behavior of the parser and the presence or absence of the other instructions. For the Microsoft parser, the whitespace-only nodes are removed unless you instruct otherwise, for Saxon they stay. I have noticed that the xml:space attribute in the source file has priority over xsl:strip-space='preserve' in the stylesheet (at least for msxml3 and Saxon), but I don't know if that is specified somewhere or not (Mike Kay will no doubt give us the definitive answer here). Cheers, Tom P XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XML apparently cannot be , Paul Terray | Thread | Re: [xsl] XML apparently cannot be , Wendell Piez |
RE: [xsl] Different outputs when i , Jarno . Elovirta | Date | Re: [xsl] XML apparently cannot be , Wendell Piez |
Month |