[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] doctype


Subject: Re: [xsl] doctype
From: Colin Paul Adams <colin@xxxxxxxxxxxxxxxxxx>
Date: 30 Jun 2006 18:56:32 +0100

>>>>> "Marcus" == Marcus Streets <marcus@xxxxxxxxxxx> writes:

    Marcus> I probably missing something trivial here.  I have an xml
    Marcus> document with the doctype:

    Marcus> <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet
    Marcus> href="http://localhost/xslt/docbook/html/docbook.xsl"
    Marcus> type="text/xsl"?> <!DOCTYPE book SYSTEM
    Marcus> "../../System/DTD/main.dtd"[ <!NOTATION XML SYSTEM "">
    Marcus> <!NOTATION MIF SYSTEM ""> <!NOTATION TIF SYSTEM "">
    Marcus> <!NOTATION AI SYSTEM ""> <!ENTITY % catalog PUBLIC
    Marcus> "-//Siberlogic//ENTITIES V3.0.1//EN"
    Marcus> "file:///C:/xml/fips/catalog.pen"> %catalog; ]>

    Marcus> On which I am going to run an identity transformation
    Marcus> which is going to do some filtering.

    Marcus> The question is - is can I keep the Doctype as is.

    Marcus> There are various xml:output options, but I seem to need
    Marcus> to know what the doctype is - and I really just want to
    Marcus> pass it through.

The first problem is to read the doctype - when the xml file is
parsed, this information is lost.

If you are able to use XSLT 2.0, then you can recover the information
by reading the file a second time, using the unparsed-text() function.

You could then use the various XPath 2.0 string functions to extract
the DOCTYPE internal subset yourself.

    Marcus> If I have to define it - how do I define the part within
    Marcus> the square brackets. I( can see how to specify the rest
    Marcus> but not that.

There is no standard way of doing this. Some processors provide a
means to specify this information. let's assume you are using XSLT
2.0, and you have read in and isolated the internal subset with unparsed-text().

In the case of Saxon 8, there are processor-specific facilities to
specifiy the various components of the internal subset (look at the Saxon
documentation). In this case, you would have to completely parse the
internal subset, and then write each part out (I think).

In the case of gestalt, there is a processor-specific output method
that allows you to specify the entire internal subset as a
string. This would be ideal for your scenario (although I don't claim
to have had any wonderful foresight here - I was just writing the
output method as an example of how to do it).

In either case, you are fighting against the rationale of XSLT
processing - the information set of the xml document is the intended
input to a transformation. So it would be better to see if you can
avoid the whole scenario (maybe a non-XSLT approach is what you need).
-- 
Colin Adams
Preston Lancashire


Current Thread
Keywords