Empty xmlns attribute in XSLT output

Here should go questions about transforming XML with XSLT and FOP.
mboudreau
Posts: 58
Joined: Sat Jan 07, 2017 1:23 am

Empty xmlns attribute in XSLT output

Post by mboudreau » Mon Mar 29, 2021 1:17 am

I have an XSLT conversion that is resulting in elements with xmlns="" for reasons I don't understand. I don't think this is the result of an Oxygen bug, but more likely I have an incomplete understanding of how Oxygen (or the Saxon processor) handles namespaces.

The conversion giving the unexpected result converts instances of the JATS DTD to HTML. A typical JATS article for this conversion begins with

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD UCP JATS (Z39.96) Journal Publishing DTD with OASIS Tables with MathML3 v1.2 20190208//EN" "JATS-journalpublishing-oasis-article1-mathml3.ucp.dtd">
<article
  xmlns:ali="http://www.niso.org/schemas/ali/1.0/"
  xmlns:mml="http://www.w3.org/1998/Math/MathML"
  xmlns:oasis="http://www.niso.org/standards/z39-96/ns/oasis-exchange/table"
  xmlns:xlink="http://www.w3.org/1999/xlink" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  article-type="research-article" dtd-version="1.2" xml:lang="en">
and is converted to HTML beginning

Code: Select all

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops"
      xmlns:mml="http://www.w3.org/1998/Math/MathML"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      lang="en">
This is an abbreviated version of the XSL stylesheet:

Code: Select all

<xsl:stylesheet version="2.0" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:mml="http://www.w3.org/1998/Math/MathML"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="xs xsl">
    
    <xsl:output method="xml" indent="no" omit-xml-declaration="no"/>
    <xsl:strip-space elements="*"/>
    <xsl:preserve-space elements="string-name"/>

    <xsl:template match="article">
        <html xmlns:epub="http://www.idpf.org/2007/ops" lang="en">
            <head>
            [snip]
            </head>
            <body>
                <article>
                    <xsl:apply-templates/>
                </article>
            </body>
        </html>
    </xsl:template>

   [plus many other templates]

    <xsl:template match="*" name="process-element">
        <xsl:copy copy-namespaces="no">
            <xsl:for-each select="@*">
                <xsl:copy/>
            </xsl:for-each>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
</xsl:styleheet>
I had been running this conversion on instances of a predecessor of the JATS DTD and hadn't had any problems. Now I'm preparing a version to run on instances of JATS, and I find that every <sup> or <sub> element from the JATS instance appears in the HTML intance as <sup xmlns=""> or <sub xmlns="">.

This was baffling to me because those are the only two elements that would appear in the HTML output with the empty 'xmlns' attribute. I eventually realized that these were the only two elements in the JAT input that were being processed by the default template name "process-element", shown in the code above. When I created templates specifically for those elements, like so--

Code: Select all

<xsl:template match="sup">
    <xsl:element name="sup">
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>
--the resulting <sup> and <sub> elements appeared in the HTML without the empty 'xmlns' attribute. As it happens, 'sup' and 'sub' are two of a small number of element names that are found in both the JATS and HTML tag sets. I found I was able to generate more instances of the empty 'xmlns' attribute by commenting out the template for the 'p' element and letting that be handled by the default template as well.

After searching online, I found that this problem had been discussed on StackOverflow a few times, for example at https://stackoverflow.com/questions/206 ... sformation:
If you have xmlns="" appearing on an element in your output then it means that you added an element with no namespace to the tree at a point where there was a default namespace already in force. In order to output such a structure the serializer must countermand that default with xmlns="".
However, I'm not sure if this applies to my problem, because I didn't think there was a default namespace already in force or that I had done anything to remove it. To check this, I added a log message to my default template:

Code: Select all

    <xsl:template match="*" name="process-element">
        <xsl:message>
            <xsl:text>PROCESS-ELEMENT called for element '</xsl:text>
            <xsl:value-of select="name(.)"/>
            <xsl:text>' in namespace '</xsl:text>
            <xsl:value-of select="namespace-uri()"/>
            <xsl:text>'</xsl:text>
        </xsl:message>
        <xsl:copy copy-namespaces="no">
            <xsl:for-each select="@*">
                <xsl:copy/>
            </xsl:for-each>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
And the output I got from Oxygen for every element handled by the default template was "PROCESS-ELEMENT called for element '[whatever]' in namespace ''"

At the same time that I was puzzling over this JATS-to-HTML conversion, I had a JATS-to-JATS conversion that used the same default template and whose output never included the empty 'xmlns' attribute.

Am I troubleshooting this the wrong way? I know how to fix the problem for the moment--create templates explicitly for the elements that are getting the empty 'xmlns' attribute--but I'm still not sure why it's necessary for one conversion for not for the other.

Sorry for such a long-winded description.

Radu
Posts: 7468
Joined: Fri Jul 09, 2004 5:18 pm

Re: Empty xmlns attribute in XSLT output

Post by Radu » Mon Mar 29, 2021 4:07 pm

Hi,

I also consider the answer you found on StackOverflow applies to your case:
If you have xmlns="" appearing on an element in your output then it means that you added an element with no namespace to the tree at a point where there was a default namespace already in force. In order to output such a structure the serializer must countermand that default with xmlns="".
If you want further help with this I need a small set of complete XML + XSLT samples to reproduce the problem. You can also use our support email address (support@oxygenxml.com) to send them if they are too large for copy pasting on the forum thread.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

mboudreau
Posts: 58
Joined: Sat Jan 07, 2017 1:23 am

Re: Empty xmlns attribute in XSLT output

Post by mboudreau » Mon Mar 29, 2021 11:59 pm

Hi Radu,

Before I dump a lot of files on you, perhaps I can focus my question a little better.

When my JATS-to-HTML stylesheet begins like this:

Code: Select all

<xsl:stylesheet version="2.0" 
    xmlns="http://www.w3.org/1999/xhtml" 
    xmlns:epub="http://www.idpf.org/2007/ops"
    xmlns:mml="http://www.w3.org/1998/Math/MathML"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="mml xlink xs xsl">
    
    <xsl:output method="xml" indent="no" omit-xml-declaration="no"/>
    <xsl:strip-space elements="*"/>
    <xsl:preserve-space elements="string-name"/>
and my template for the JATS root element creates the <html> element like this:

Code: Select all

<xsl:template match="article">
   <html lang="en">
      <head>
         [...]
      </head>
      <body>
         [...]
      </body>
   </html>
</xsl:template>
the output file begins like this:

Code: Select all

<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:epub="http://www.idpf.org/2007/ops" 
      lang="en">
Question 1: is the 'xmlns' attribute being added because it's included in the <xsl:stylesheet> tag?

If I remove 'xmlns="http://www.w3.org/1999/xhtml"' from the <xsl:stylesheet> tag, it doesn't appear in the output <html> tag, but then Oxygen gives the parsing error 'elements from namespace "" are not allowed' at the very beginning of the HTML file.

Question 2: why does Oxygen require the HTML file to have a default namespace?

Radu
Posts: 7468
Joined: Fri Jul 09, 2004 5:18 pm

Re: Empty xmlns attribute in XSLT output

Post by Radu » Tue Mar 30, 2021 9:05 am

Hi,
Question 1: is the 'xmlns' attribute being added because it's included in the <xsl:stylesheet> tag?
Yes, because it's declared as a default namespace on the xsl:stylesheet root.
A very small example to reproduce your problem would look like this:

1) XML:

Code: Select all

<root>
    <child></child>
</root>
2) XSL:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    xmlns="abc"
    version="2.0">
    <xsl:output method="xml" indent="no" omit-xml-declaration="no"/>
    <xsl:template match="/">
        <html>
            <xsl:apply-templates/>
        </html>
    </xsl:template>
    
    <xsl:template match="child">
        <xsl:copy copy-namespaces="no">
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
So the "html" element will be emitted in the default namespace which is "abc" and because on the "child" element xsl:copy you explicitly say you do not want a namespace, an "xmlns=''" is emitted for it.
If I remove 'xmlns="http://www.w3.org/1999/xhtml"' from the <xsl:stylesheet> tag, it doesn't appear in the output <html> tag, but then Oxygen gives the parsing error 'elements from namespace "" are not allowed' at the very beginning of the HTML file.
Question 2: why does Oxygen require the HTML file to have a default namespace?
What version of Oxygen are you using? Recent versions of Oxygen apply different validation if it considers the HTML document to be XHTML (in the XHTML namespace) or HTML which does not have a namespace.
For example if you create a new file with the ".html" extension and the contents:

Code: Select all

<html>
    <head>
        <title></title>
    </head>
    <body></body>
</html>
then open and validate it in Oxygen, what error does Oxygen signal for it?

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

mboudreau
Posts: 58
Joined: Sat Jan 07, 2017 1:23 am

Re: Empty xmlns attribute in XSLT output

Post by mboudreau » Tue Mar 30, 2021 6:32 pm

Thanks, Radu. This clarifies things a lot.

I didn't realize that having a default namespace on the stylesheet applied that to the output of the conversion.

I'm using Oxygen version 23.0, which I see now requires <html xmlns="http://www.w3.org/1999/xhtml"> for XHTML files but only <html> for HTML files. (The sample content you provided for a file with a .html extension does not produce an error.) This conversion is producing XHTML files for use in EPUBs, which explains why the default namespace was declared on the stylesheet root when it was originally written.

Thanks again for your help!

Post Reply