Page 1 of 1

Generating HTML5 DOCTYPE with XSLT

Posted: Sun Mar 18, 2012 6:26 am
by sarcanon
I have been trying (and failing) all evening to generate proper HTML5 documents using XSLT, with various combinations of output methods and @doctype-system and @doctype-public attribute values. But no matter what I try, I cannot produce a simple DOCTYPE like this:

Code: Select all

<!DOCTYPE html>
Might someone please be able to tell me what I am doing wrong?

FWIW, I am using Oxygen 13.2 build 2012030716 and the Saxon-HE 9.3.0.5 processor on Windows 7 (64). And the beginning of my stylesheet looks like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:t="http://www.tei-c.org/ns/1.0"
exclude-result-prefixes="t" version="2.0">

<xsl:output method="html" encoding="UTF-8"/>
Many thanks in advance.

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Mon Mar 19, 2012 9:44 am
by Radu
Hi,

You could use the xsl:text element to output the empty DOCTYPE declaration like:

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xhtml"/>
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'><!DOCTYPE html></xsl:text>
<html xmlns="http://www.w3.org/1999/xhtml">
..........................
</html>
</xsl:template>
</xsl:stylesheet>
Regards,
Radu

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Tue Mar 20, 2012 8:49 am
by sarcanon
Dear Radu:

That works. Thank you very much.

But I am a little surprised that there isn't a more elegant solution within XSLT to address what must be a common requirement. I surely cannot be the only person who wants to generate HTML5 docs in this manner.

In any event, thanks again for the solution.

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Tue Mar 20, 2012 9:47 am
by Radu
Hi,

From what I've tested a Web Browser always interprets opened HTML files as HTML 5 so you do not need to set that <!DOCTYPE html> in the file in order for the browser to interpret it as HTML 5.

From an XML point of view a syntax like this: <!DOCTYPE html> without specifying a public or system ID is meaningless so this is probably why this XSLT construct:

Code: Select all

<xsl:output method="xml" doctype-system="" doctype-public=""></xsl:output>
will ignore generating the DOCTYPE altogether.

Here is a list of differences between HTML 5 and 4:

http://www.w3.org/TR/html5-diff/

Regards,
Radu

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Thu Mar 22, 2012 9:53 am
by jelovirt
HTML5 has a DOCTYPE legacy string:

Code: Select all

<!DOCTYPE html SYSTEM "about:legacy-compat">
That can be output with

Code: Select all

<xsl:output doctype-system="about:legacy-compat"/>

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Tue Mar 25, 2014 4:38 pm
by jannylun
I agree with jelovirt, the preferred way from XSLT is to use the proper legacy string. However, this does currently not validate in Oxygen 15.

Image

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Tue Mar 25, 2014 4:57 pm
by Radu
Hi Jan,

There is a problem with the approach:

Code: Select all

<!DOCTYPE html SYSTEM "about:legacy-compat">
Basically Oxygen properly detects the document as being XHTML 5 and it uses XHTML 5 Relax NG Schemas to validate the XML document.
But as the document has a SYSTEM identifier specified on the DOCTYPE, any XML parser (like the Xerces parser used by the Relax NG validation) will try to expand that SYSTEM reference to a resource before the XML is parsed (XML might contain entity references or default attribute values which come from the DTD). And the XML parser fails to do so.

As a workaround you could create an XML catalog file with the content:

Code: Select all

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<system systemId="about:legacy-compat" uri="test.dtd"/>
</catalog>
which maps the system ID to a file test.dtd which is empty. And then add the XML catalog to the catalogs list in the Oxygen Preferences XML / XML Catalog page.

In my opinion any HTML document without a specific DOCTYPE is considered HTML 5 by a web browser so you do not need to set any particular DOCTYPE declaration to it.

Regards,
Radu

Re: Generating HTML5 DOCTYPE - Use version="5" attribute on output method

Posted: Fri May 04, 2018 3:28 am
by catrachos
The information in this post not current, so if anyone stumbles onto it as I did, I'll save you time with up-to-date information. Saxon 7 supports a version attribute on the output element. The following output line will put <!DOCTYPE HTML> at the top of HTML output.

Code: Select all

<xsl:output method="html" encoding="UTF-8" version="5"/>
Optionally you can add the indent="yes|no" attribute. For a named method, you can use something like the following:

Code: Select all

<xsl:output name="html5" method="html" encoding="UTF-8" version="5"/>
and use the format attribute on the corresponding result-document, in this case:

Code: Select all

<xsl:result-document href="filename.html" format="html5">
...
</xsl:result-document>
This worked for me. Other people claim to use version="5.0" with the same result, apparently. The Saxon documentation just says 5, so that's what I used.
Possibly supported for recent pre-7 versions of Saxon, possibly other recent XSLT/XPATH processors. Check the documentation for your processor.

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Fri May 04, 2018 7:40 am
by Radu
Hi,

Thanks for the update, I tested with Saxon 9.6, 9.7 and 9.8 and they all produce the proper output for this.
I think that somehow my original approach with this thread was to try and obtain a valid HTML 5 which was also XML-wellformed and could thus be processed further with XSLT for example or could be opened in Oxygen and edited (as Oxygen does not handle not-wellformed XML content tool well).
But if somebody wants to obtain HTML5 not XML-wellformed but adhering to the HTML 5 specs, your advice is the way to go.

Regards,
Radu

Re: Generating HTML5 DOCTYPE with XSLT

Posted: Fri May 04, 2018 6:11 pm
by catrachos
Thanks Radu. Sorry, :( I meant to say Saxon 9.7, not 7, but you got it straight. :) I thought HTML5 was or should be well formed XML, but ignore my confusion. :? Ciao.