[oXygen-user] HTML5 transformation

Tue Aug 5 07:33:23 CDT 2014

Hello Peter,

Oxygen uses the Epubcheck schemas for validating EPUB3/(X)HTML5.
Unfortunately this is a case of the HTML5 specification changing and the 
schemas not keeping up with the change.
The old HTML5 specs specifically forbid using http-equiv="content-type" 
in XML documents, while the new specs allow it.

Regarding the 'Encoding declaration state (http-equiv="content-type")' 
the W3C Candidate Recommendation from 17 December 2012 said:
http://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#attr-meta-http-equiv-content-type
> The Encoding declaration state may be used in HTML documents, but 
> elements with an http-equiv attribute in that state *must not be used 
> in XML documents.*
However, since the 2013 CR and also in the current W3C Candidate 
Recommendation this has changed:
http://www.w3.org/TR/2013/CR-html5-20130806/document-metadata.html#attr-meta-http-equiv-content-type
> The encoding declaration state *may be used in HTML documents and in 
> XML Documents.* If the encoding declaration state is used in XML 
> Documents, the name of the character encoding must be an ASCII 
> case-insensitive match for the string "UTF-8" (and the document is 
> therefore forced to use UTF-8 as its encoding).

This is currently logged as a bug for the Epubcheck validator:
content-type value should be allowed for http-equiv attribute of meta tag
https://code.google.com/p/epubcheck/issues/detail?id=135

To keep a long story short(er), the simplest solution for this problem 
is to patch the schema to accept 'http-equiv="Content-Type"' (which is 
allowed  as per the latest HTML 5 specification). See the attached RNC 
schema file which patches this.
To apply this patch, save the 'html5-document-30.rnc' file locally and 
copy it to your Oxygen installation folder in:
Oxygen/frameworks/xhtml/xhtml5 (epub3)/mod/html5
Replace the existing file when prompted to. Depending on the location 
where Oxygen is installed you may need admin access to perform this 
operation.

To answer your questions:
> Is it possible to circumvent the copy of the existing head 
> subelements, including meta?
No, not without changing the output method to "xml". The meta is 
actually not a copy of the original, when you specify
<xsl:output method="xhtml"/>, Saxon serializes that meta in the output.

> Why is the xslt so stubborn about http-equiv="Content-Type", when the 
> validation rejects it for html5?
It's not its fault, really, it's the validation that shouldn't reject it.

Let us know if you need additional assistance.

Regards,
Adrian

Adrian Buza
oXygen XML Editor and Author Support

Tel: +1-650-352-1250 ext.2020
Fax: +40-251-461482

http://www.oxygenxml.com

On 05.08.2014 12:23, Peter West wrote:
> I'm transforming EPUB2 files to xhtml5 for EPUB3.
>
> Here's the beginning of one of the EPUB2 files.
> <?xml version='1.0' encoding='utf-8'?>
> <html xmlns="http://www.w3.org/1999/xhtml">
>   <head>
>     <title>Personal Knowledge: Towards a Post-Critical Philosophy</title>
>     <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type" />
> 	...
>   </head>
>
> My xsl file has, inter alia, this:
>
>     <xsl:output method="xhtml" html-version="5"/>
>
> When I do the transform, I get html5 output, including the meta element, like so:
>
> <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
>    <head>
>       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
>       <title>Personal Knowledge: Towards a Post-Critical Philosophy</title>
> 	...
>    </head>
>
> Oxygen and Jing do not like
>   http-equiv="Content-Type"
>
> Uh-uh, no way.  They want Default-Style or Refresh, and nothing else will do.  I went through every contortion I could think of to recast teh meta element, to no avail.  I could generate a second meta element, but I could not get rid of the original meta.  I was as though that meta was hard coded into the html5 transformer.
>
> So I have two questions.
>
> Is it possible to circumvent the copy of the existing head subelements, including meta?
> Why is the xslt so stubborn about http-equiv="Content-Type", when the validation rejects it for html5?
>
> Peter West
>
> "...the kingdom of heaven is like a merchant in search of fine pearls, who, finding one pearl of great value, sold all that he had and bought it."
>
> Peter West
>
> "...the kingdom of heaven is like a merchant in search of fine pearls, who, finding one pearl of great value, sold all that he had and bought it."
>
> _______________________________________________
> oXygen-user mailing list
> 
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20140805/bbf748cb/attachment.html>
-------------- next part --------------

   html5.html =
## The html element represents the root of an HTML document.
 element html { html5.html.attlist, html5.head, html5.body }       
   html5.html.attlist &= html5.global.attrs 

   html5.head =
## The head element represents a collection of metadata for the Document.
 element head { html5.head.attlist & html5.head.content } 
   html5.head.attlist &= html5.global.attrs    
   html5.head.content = html5.title? & html5.base? & html5.metadata.class*
   html5.metadata.class |= html5.link | html5.meta

   html5.body =
## The body element represents the main content of the document.
 element body { html5.body.attlist & html5.body.content }      
   html5.body.attlist &= html5.global.attrs                            
   html5.body.content = html5.section.model

   html5.base =  element base { html5.base.attlist }
   html5.base.attlist &= html5.global.attrs & 
      ((html5.href.attr & html5.target.attr?) | html5.target.attr)   

   html5.link =
## The link element allows authors to link their document to other resources.
 element link { html5.link.attlist }
   html5.link.attlist &= html5.global.attrs & html5.href.attr & 
      html5.rel.attr & html5.media.attr? & html5.hreflang.attr? &
      html5.type.mime.attr? & html5.link.sizes.attr?        
   html5.link.sizes.attr = attribute sizes { 'any' | datatype.html5.sizes } 

   html5.meta =
## The meta element represents various kinds of metadata that cannot be expressed using the title, base, link, style, and script elements.
 element meta { html5.meta.attlist }
   html5.meta.attlist &= html5.global.attrs & (
     (html5.meta.name.attr & html5.meta.content.attr)
     | (html5.meta.http-equiv.attr & html5.meta.content.attr)
     | html5.charset.attr )           
   html5.meta.name.attr = attribute name { datatype.string }     
   html5.meta.http-equiv.attr = attribute http-equiv { html5.meta.http-equiv.attr.content }
   html5.meta.http-equiv.attr.content = xsd:string { pattern = "([Dd][Ee][Ff][Aa][Uu][Ll][Tt]\-[Ss][Tt][Yy][Ll][Ee])|([Rr][Ee][Ff][Rr][Ee][Ss][Hh])|([Cc][Oo][Nn][Tt][Ee][Nn][Tt]\-[Tt][Yy][Pp][Ee])" }
   html5.meta.content.attr = attribute content { datatype.string }  

   html5.title =
## The title element represents the document's title or name. Authors should use titles that identify their documents even when they are used out of context, for example in a user's history or bookmarks, or in search results. The document's title is often different from its first heading, since the first heading does not have to stand alone when taken out of context.
 element title { html5.title.attlist & html5.title.content }
   html5.title.attlist &= html5.global.attrs
   html5.title.content = datatype.text