Open XML Validation

This should cover W3C XML Schema, Relax NG and DTD related problems.
tmakita
Posts: 100
Joined: Fri Apr 08, 2011 7:58 am

Open XML Validation

Post by tmakita »

Hi,
I'm testing https://github.com/jelovirt/com.elovirta.ooxml. When I open the result .docx file, Word (2013) reports that it cannot open the .docx file because there is error.
So I tried to to open & validate this .docx file via oXygen 19.0. When I open the "xml/document.xml" in the .docx file, I've noticed that oXygen normally reports error for some parts, but there are the error report that cannot be understood. For instance:

Code: Select all

<w:hyperlink r:id="rIdHyperlink104">
<w:r>
<w:rPr>
<w:i/>
<w:rStyle w:val="Hyperlink"/>
</w:rPr>
<w:t>Creative Commons Attribution-ShareAlike License</w:t>
</w:r>
</w:hyperlink>
oXygen reports errors about w:rStyle.

Code: Select all

System ID: zip:file:/C:/Users/toshi/OneDrive/Documents/test/wml/wml-link/out/docx/mWebLinkTest.docx!/word/document.xml
Main validation file: zip:file:/C:/Users/toshi/OneDrive/Documents/test/wml/wml-link/out/docx/mWebLinkTest.docx!/word/document.xml
Schema: C:\Program Files\Oxygen XML Editor 19\frameworks\ooxml\schemas\main.nvdl
Engine name: Jing
Severity: error
Description: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://schemas.openxmlformats.org/wordprocessingml/2006/main":rStyle}'. One of '{"http://schemas.openxmlformats.org/wordprocessingml/2006/main":iCs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":caps, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":smallCaps, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":strike, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":dstrike, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":outline, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":shadow, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":emboss, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":imprint, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":noProof, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":snapToGrid, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":vanish, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":webHidden, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":color, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":spacing, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":w, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":kern, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":position, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":sz, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":szCs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":highlight, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":u, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":effect, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":bdr, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":shd, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":fitText, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":vertAlign, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":rtl, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":cs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":em, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":lang, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":eastAsianLayout, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":specVanish, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":oMath, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":rPrChange}' is expected.
Start location: 159:54
According to the newest schemas (wordml.xsd) from http://standards.iso.org/ittf/PubliclyA ... index.html it should not reported as error.

Code: Select all

  <xsd:complexType name="CT_RPrOriginal">
<xsd:sequence>
<xsd:group ref="EG_RPrBase" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
...
<xsd:group name="EG_RPrBase">
<xsd:choice>
<xsd:element name="rStyle" type="CT_String"/>
<xsd:element name="rFonts" type="CT_Fonts"/>
<xsd:element name="b" type="CT_OnOff"/>
<xsd:element name="bCs" type="CT_OnOff"/>
<xsd:element name="i" type="CT_OnOff"/>
<xsd:element name="iCs" type="CT_OnOff"/>
<xsd:element name="caps" type="CT_OnOff"/>
<xsd:element name="smallCaps" type="CT_OnOff"/>
<xsd:element name="strike" type="CT_OnOff"/>
<xsd:element name="dstrike" type="CT_OnOff"/>
<xsd:element name="outline" type="CT_OnOff"/>
...
</xsd:choice>
</xsd:group>
In contrast, the schema (probably) oXygen using is defined as follows:
[C:\Program Files\Oxygen XML Editor 19\frameworks\ooxml\schemas\xsd]

Code: Select all

	<xsd:group name="EG_RPrBase">
<xsd:sequence>
<xsd:element name="rStyle" type="CT_String" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Referenced Character Style</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="rFonts" type="CT_Fonts" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Run Fonts</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="b" type="CT_OnOff" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Bold</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="bCs" type="CT_OnOff" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Complex Script Bold</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="i" type="CT_OnOff" minOccurs="0">
<xsd:annotation>
<xsd:documentation>Italics</xsd:documentation>
</xsd:annotation>
</xsd:element>
...
It is natural that oXygen reports error for w:rStyle, because it must comes first child of w:rPr.

But I found that this schema is bit old.
Do you have any plans to update WordProcessingML scheme files to the newest?

I attached sample DITA instance.

https://1drv.ms/u/s!AkbL99fLhxKUgP9CxUN0rIA_mzF4MA

Regards,
--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/
Radu
Posts: 9041
Joined: Fri Jul 09, 2004 5:18 pm

Re: Open XML Validation

Post by Radu »

Hi Toshihiko,

I will add an internal issue to update the XML Schemas we ship for OOXML validation.
About the publishing problem, I have no issue opening the OOXML file with MS Word 2016.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
tmakita
Posts: 100
Joined: Fri Apr 08, 2011 7:58 am

Re: Open XML Validation

Post by tmakita »

Hi Radu,

> About the publishing problem, I have no issue opening the OOXML file with MS Word 2016.

It may be my mistake. The error occurred in more complex DITA documents.

> I will add an internal issue to update the XML Schemas we ship for OOXML validation.

For your reference, I found fatal XML Scheme error in both ISO and ECMA supplied OfficeOpenXML-XMLSchema-Strict/wml.xsd itself when I opened via oXygen.
Some default value is assigned as "no" although its type is defined as xs:boolean.
I'm wondering how to solve this problem because its is international standard.

Regards,
--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/
Radu
Posts: 9041
Joined: Fri Jul 09, 2004 5:18 pm

Re: Open XML Validation

Post by Radu »

Hi Toshihiko,

So:
It may be my mistake. The error occurred in more complex DITA documents.
If you put together a DITA project for which the generated DOCX is invalid you can create a new issue directly on Jarno's plugin issues list:

https://github.com/jelovirt/com.elovirta.ooxml
For your reference, I found fatal XML Scheme error in both ISO and ECMA supplied OfficeOpenXML-XMLSchema-Strict/wml.xsd itself when I opened via oXygen.
Some default value is assigned as "no" although its type is defined as xs:boolean.
I'm wondering how to solve this problem because its is international standard.
Probably Microsoft does not validate with a 100% standards conformant parser. When we update the schemas we'll probably manually fix this error.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Radu
Posts: 9041
Joined: Fri Jul 09, 2004 5:18 pm

Re: Open XML Validation

Post by Radu »

Hi,

Oxygen 19.1 was released a couple of days ago and it should contain an update on its internally used MS Word XML Schemas.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply