[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Preserving inline DTD

Subject: Re: [xsl] Preserving inline DTD
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Tue, 28 Jan 2014 09:41:00 -0500


I'm afraid the OP is asking not about a system or public identifier on
a DOCTYPE declaration, but about preserving a DTD internal subset.

As David has remarked, this is not possible in unextended XSLT 1.0,
which was designed specifically for a defined use case: "XSLT is not
intended as a completely general-purpose XML transformation language.
Rather it is designed primarily for the kinds of transformations that
are needed when XSLT is used as part of XSL"
(http://www.w3.org/TR/xslt). Whenever "XSL" is used in a context like
this, we have to add (as the sentences in the Rec do also) that what
we mean by "XSL" is "XSLT + XSL-FO".

An XSL-FO processor has no need for a DTD internal subset; indeed in
that architecture one would ordinarily consider one to be irregular
and superfluous if not worse.

So the XSLT 1.0 answer is "extend your processor". Implement a custom
serialization method for your processor that does whatever you want.

The XSLT 2.0 answers might include "sniff the internal subset from the
input and fake it for the output". As Graydon suggests, you could use
unparsed-text() function for the sniffing part. For the rest, the
reason I say "fake it" is that I know of no off-the-shelf serializers
that will write a DTD internal subset, so in XSLT you'd have to use
disable-output-escaping, which we generally -- um -- frown on.

You could combine these answers: embed your XSLT 1.0 transformation in
a pipeline that would provide the serialization you want as a
post-process. You might choose not to use XSLT at all for the rest of
the pipeline.

What David C doesn't tell us is that he could implement such a
pipeline using Unix tools in just a few minutes. Of course, this gets
us into questions of platform dependencies, etc. There's also Ant and
such like.

One might mention XProc, except to open a can of worms, since XProc
has its own set of issues (and then we're off topic).

Cheers, Wendell

On Mon, Jan 27, 2014 at 6:56 PM, Graydon <graydon@xxxxxxxxx> wrote:
> On Mon, Jan 27, 2014 at 03:35:33PM -0800, Martin Holmes scripsit:
>> On 14-01-27 03:33 PM, David Carlisle wrote:
>> >On 27/01/2014 23:26, Piotr Fusik wrote:
>> >>How do I make xsltproc preserve the DTD that is in the input XML ?
>> >
>> >Unless it has a non-standard extension (which I don't recall is the
>> >case) then this is not possible. Standard XSLT can not do this as the
>> >DTD is expanded out by the XML parser and not reported to XSLT which
>> >just sees a tree of element text and attribute nodes.
>> Couldn't the XSLT re-read the source document as text, using the
>> document() function, and recover the DTD section with string
>> manipulation?
> document() will insist on parsing the document, so I don't think so, no.
> If you have unparsed-text() (which xsltproc won't because it's XSLT 1.0)
> you can do that to get the contents of the DOCTYPE declaration.
> If it will always be the same DTD, or you know what DTD it will be at
> run time, you can get the xsl:output to create a DOCTYPE declaration in
> the result document by setting the doctype-public and possibly
> doctype-system attributes on xsl:output to the values you want, which
> might have been what the original question was about.
> -- Graydon

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables

Current Thread