[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Effects of white space between xml elements


Subject: Re: [xsl] Effects of white space between xml elements
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 06 Mar 2009 11:24:44 -0500

Nat,

At 01:09 AM 3/6/2009, you wrote:
Thanks a lot for all your suggestions.  If I am interpreting what you
all are saying right, white space is handled differently depending on
a few factors. 1) the transformer can follow different rules 2) the
stylesheet can have rules that tell the transformer specifically how
and when to handle white space.

That's essentially correct, with the refinement that in category (1), all XSLT 1.0 engines should be the same (with one significant exception), while the differences between XSLT 2.0 engines will be more significant and require more attention.


For example, you might see in Saxon 9's command line interface that the "-strip" argument can be used to switch whitespace handling. It allows three values, 'all', 'none' and 'ignorable'. 'none' is the proper XSLT 1.0 behavior; 'ignorable' is a 2.0 feature that requires reference to a schema (a whitespace-only text node child of an element that only allows element children is necessarily ignorable). 'all' would perhaps not be entirely conformant in 1.0, which says that whitespace should never be stripped from input unless xsl:strip-space says to do so. Except this arguably contradicts the principle that an XSLT engine can accept input that has already been subject to processing -- which might include whitespace munging -- and at least one widely-installed XSLT 1.0 engine, MSXML, will ordinarily do this unless you take measures to prevent it. In 2.0, the rules are made more explicit (at least as I read them :-) that this is allowable, as well as being clearer that if whitespace-only nodes (or indeed any input at all) is gone before the XSLT engine even gets them, there's nothing to be done about that on a general basis.

The reason for this is that whether whitespace-only nodes make it into the source tree of a transform is really not the job of the transformation at all, but rather of processes preceding transformation. While this is commonly initiated by parsing an XML document (in which ignorable whitespace nodes are ubiquitous), it's also commonly not (maybe the XML is stored in a database, or generated dynamically).

A further refinement is that the controls offered in XSLT itself, xsl:strip-space and xsl:preserve-space (even leaving aside coding idioms such as select="*" to work around any text node children), are generally sufficient to the task, at least assuming you are working to a more or less fixed document type. If this is the case, as long as you're starting with serialized XML source (and not within a database or pipeline architecture) the schema support and other options that come in 2.0 can be viewed as a complication as well as a convenience; there's nothing wrong with just planning that no whitespace-only nodes will be stripped (and then perhaps assuring they're not), and managing it all from within the XSLT. That's a prudent and workable approach, and requires only that you Know Your Tree, which is Rule #1 for writing reliable XSLT code in any case.

This is a complicated issue in part because serializers are also allowed to add cosmetic whitespace ... which then may have to be recognized as such by downstream processors....

I'm sure that David, Mike, Ken, Tony or someone will weigh in if this summary isn't entirely accurate. :-)

Cheers,
Wendell



======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


Current Thread
Keywords