Docx to xml

This should cover W3C XML Schema, Relax NG and DTD related problems.
apzkhan
Posts: 2
Joined: Mon May 22, 2017 5:40 pm

Docx to xml

Post by apzkhan »

Hi

I'm still new to this, so apologies in advance if what I'm asking about is painfully obvious.

I have to publish content using a content management system, this system validates the content against an xml schema. However the content is originally written on word documents, so at the moment we manually add the appropriate tags that are necessary for the document to be validated and published.

I wanted to ask if there is an easy way to do this, and if it is possible to convert a .docx file into an xml file using the tags defined by the xml schema?

Many Thanks
Radu
Posts: 9049
Joined: Fri Jul 09, 2004 5:18 pm

Re: Docx to xml

Post by Radu »

Hi,

One possibility is to save the Word document to HTML from MS Office, then convert the HTML to XHTML (Oxygen has a File->Import feature which can do that) and then use XSLT processing to convert the HTML to your target vocabulary.
Or you can open the DOCX file in the Oxygen Archive Browser view, open from it the "document.xml" which contains the main document's contents and then apply a custom XSLT stylesheet to try and convert it to some other XML format.
For certain target XML vocabularies like DITA and Docbook Oxygen has a special feature called "Smart Paste" (based on a set of predefined internal XSLT stylesheets) which can help with the conversion:

http://blog.oxygenxml.com/2016/05/how-t ... -dita.html

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
apzkhan
Posts: 2
Joined: Mon May 22, 2017 5:40 pm

Re: Docx to xml

Post by apzkhan »

Thanks, I'll give them a go and see how far I can get.
Post Reply