[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] XHTML DTD aware transformation and indentation behaviour


Subject: Re: [xsl] XHTML DTD aware transformation and indentation behaviour
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 02 Feb 2012 11:09:44 +0000

On 02/02/2012 10:48, Matthieu Ricaud-Dussarget wrote:
Hi all,

In my project I concatenate multiple xhtml files in one xml files. This aggregate file has to be edited by hand, that means indentation is important here for convenience.

Before I discovered XML Catalog, I used to delete all DOCTYPE declarations within source XHTML file with a perl script (which also remplace named entities with UTF-8 ones). This worked fine : the concatenated files were indented exactly like the XHTML sources.

But this was a bit dangerous in case I didn't match a special entity to replace with perl. And this was not a really good XML practice.

Now that I'm using a local XML Catalog and run my tranformation with Saxon in command line with this options :
-r:org.apache.xml.resolver.tools.CatalogResolver -x:org.apache.xml.resolver.tools.ResolvingXMLReader -y:org.apache.xml.resolver.tools.ResolvingXMLReader


I can't see exactly what's happening here because your mail client and mine have conspired to ignore the whitespace which was critical to understanding your message.

Generally, if you validate against a DTD, then whitespace in elements whose content model is defined as element-only (for example head and body) will be treated as ignorable, which means it's liable to be lost in a copy operation. Perhaps this is what is happening.

Try the option -strip:none on the command line to prevent this behaviour. The documentation says this is the default, but I'm not convinced it is correct: I seem to remember it changing some time ago in response to a W3C change.

Michael Kay
Saxonica


Current Thread
Keywords