Open XML Validation

This should cover W3C XML Schema, Relax NG and DTD related problems.
tmakita
Posts: 69

Open XML Validation

Mon May 22, 2017 5:55 am

Hi,
I'm testing https://github.com/jelovirt/com.elovirta.ooxml. When I open the result .docx file, Word (2013) reports that it cannot open the .docx file because there is error.
So I tried to to open & validate this .docx file via oXygen 19.0. When I open the "xml/document.xml" in the .docx file, I've noticed that oXygen normally reports error for some parts, but there are the error report that cannot be understood. For instance:

Code: Select all

<w:hyperlink r:id="rIdHyperlink104">
    <w:r>
         <w:rPr>
                <w:i/>
                <w:rStyle w:val="Hyperlink"/>
         </w:rPr>
         <w:t>Creative Commons Attribution-ShareAlike License</w:t>
    </w:r>
</w:hyperlink>


oXygen reports errors about w:rStyle.

Code: Select all

System ID: zip:file:/C:/Users/toshi/OneDrive/Documents/test/wml/wml-link/out/docx/mWebLinkTest.docx!/word/document.xml
Main validation file: zip:file:/C:/Users/toshi/OneDrive/Documents/test/wml/wml-link/out/docx/mWebLinkTest.docx!/word/document.xml
Schema: C:\Program Files\Oxygen XML Editor 19\frameworks\ooxml\schemas\main.nvdl
Engine name: Jing
Severity: error
Description: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://schemas.openxmlformats.org/wordprocessingml/2006/main":rStyle}'. One of '{"http://schemas.openxmlformats.org/wordprocessingml/2006/main":iCs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":caps, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":smallCaps, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":strike, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":dstrike, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":outline, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":shadow, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":emboss, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":imprint, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":noProof, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":snapToGrid, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":vanish, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":webHidden, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":color, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":spacing, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":w, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":kern, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":position, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":sz, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":szCs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":highlight, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":u, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":effect, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":bdr, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":shd, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":fitText, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":vertAlign, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":rtl, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":cs, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":em, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":lang, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":eastAsianLayout, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":specVanish, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":oMath, "http://schemas.openxmlformats.org/wordprocessingml/2006/main":rPrChange}' is expected.
Start location: 159:54


According to the newest schemas (wordml.xsd) from http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html it should not reported as error.

Code: Select all

  <xsd:complexType name="CT_RPrOriginal">
    <xsd:sequence>
      <xsd:group ref="EG_RPrBase" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
  ...
   <xsd:group name="EG_RPrBase">
    <xsd:choice>
      <xsd:element name="rStyle" type="CT_String"/>
      <xsd:element name="rFonts" type="CT_Fonts"/>
      <xsd:element name="b" type="CT_OnOff"/>
      <xsd:element name="bCs" type="CT_OnOff"/>
      <xsd:element name="i" type="CT_OnOff"/>
      <xsd:element name="iCs" type="CT_OnOff"/>
      <xsd:element name="caps" type="CT_OnOff"/>
      <xsd:element name="smallCaps" type="CT_OnOff"/>
      <xsd:element name="strike" type="CT_OnOff"/>
      <xsd:element name="dstrike" type="CT_OnOff"/>
      <xsd:element name="outline" type="CT_OnOff"/>
      ...
    </xsd:choice>
  </xsd:group>


In contrast, the schema (probably) oXygen using is defined as follows:
[C:\Program Files\Oxygen XML Editor 19\frameworks\ooxml\schemas\xsd]

Code: Select all

   <xsd:group name="EG_RPrBase">
      <xsd:sequence>
         <xsd:element name="rStyle" type="CT_String" minOccurs="0">
            <xsd:annotation>
               <xsd:documentation>Referenced Character Style</xsd:documentation>
            </xsd:annotation>
         </xsd:element>
         <xsd:element name="rFonts" type="CT_Fonts" minOccurs="0">
            <xsd:annotation>
               <xsd:documentation>Run Fonts</xsd:documentation>
            </xsd:annotation>
         </xsd:element>
         <xsd:element name="b" type="CT_OnOff" minOccurs="0">
            <xsd:annotation>
               <xsd:documentation>Bold</xsd:documentation>
            </xsd:annotation>
         </xsd:element>
         <xsd:element name="bCs" type="CT_OnOff" minOccurs="0">
            <xsd:annotation>
               <xsd:documentation>Complex Script Bold</xsd:documentation>
            </xsd:annotation>
         </xsd:element>
         <xsd:element name="i" type="CT_OnOff" minOccurs="0">
            <xsd:annotation>
               <xsd:documentation>Italics</xsd:documentation>
            </xsd:annotation>
         </xsd:element>
                        ...


It is natural that oXygen reports error for w:rStyle, because it must comes first child of w:rPr.

But I found that this schema is bit old.
Do you have any plans to update WordProcessingML scheme files to the newest?

I attached sample DITA instance.

https://1drv.ms/u/s!AkbL99fLhxKUgP9CxUN0rIA_mzF4MA

Regards,
--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/
Radu
Posts: 5088

Re: Open XML Validation

Mon May 22, 2017 10:05 am

Hi Toshihiko,

I will add an internal issue to update the XML Schemas we ship for OOXML validation.
About the publishing problem, I have no issue opening the OOXML file with MS Word 2016.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
tmakita
Posts: 69

Re: Open XML Validation

Tue May 23, 2017 5:29 am

Hi Radu,

> About the publishing problem, I have no issue opening the OOXML file with MS Word 2016.

It may be my mistake. The error occurred in more complex DITA documents.

> I will add an internal issue to update the XML Schemas we ship for OOXML validation.

For your reference, I found fatal XML Scheme error in both ISO and ECMA supplied OfficeOpenXML-XMLSchema-Strict/wml.xsd itself when I opened via oXygen.
Some default value is assigned as "no" although its type is defined as xs:boolean.
I'm wondering how to solve this problem because its is international standard.

Regards,
--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/
Radu
Posts: 5088

Re: Open XML Validation

Tue May 23, 2017 9:56 am

Hi Toshihiko,

So:

It may be my mistake. The error occurred in more complex DITA documents.


If you put together a DITA project for which the generated DOCX is invalid you can create a new issue directly on Jarno's plugin issues list:

https://github.com/jelovirt/com.elovirta.ooxml

For your reference, I found fatal XML Scheme error in both ISO and ECMA supplied OfficeOpenXML-XMLSchema-Strict/wml.xsd itself when I opened via oXygen.
Some default value is assigned as "no" although its type is defined as xs:boolean.
I'm wondering how to solve this problem because its is international standard.


Probably Microsoft does not validate with a 100% standards conformant parser. When we update the schemas we'll probably manually fix this error.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Radu
Posts: 5088

Re: Open XML Validation

Mon Oct 02, 2017 1:53 pm

Hi,

Oxygen 19.1 was released a couple of days ago and it should contain an update on its internally used MS Word XML Schemas.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Return to “XML Schemas”

Who is online

Users browsing this forum: No registered users and 1 guest