Docx to xml
This should cover W3C XML Schema, Relax NG and DTD related problems.
-
- Posts: 2
- Joined: Mon May 22, 2017 5:40 pm
Docx to xml
Hi
I'm still new to this, so apologies in advance if what I'm asking about is painfully obvious.
I have to publish content using a content management system, this system validates the content against an xml schema. However the content is originally written on word documents, so at the moment we manually add the appropriate tags that are necessary for the document to be validated and published.
I wanted to ask if there is an easy way to do this, and if it is possible to convert a .docx file into an xml file using the tags defined by the xml schema?
Many Thanks
I'm still new to this, so apologies in advance if what I'm asking about is painfully obvious.
I have to publish content using a content management system, this system validates the content against an xml schema. However the content is originally written on word documents, so at the moment we manually add the appropriate tags that are necessary for the document to be validated and published.
I wanted to ask if there is an easy way to do this, and if it is possible to convert a .docx file into an xml file using the tags defined by the xml schema?
Many Thanks
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Docx to xml
Hi,
One possibility is to save the Word document to HTML from MS Office, then convert the HTML to XHTML (Oxygen has a File->Import feature which can do that) and then use XSLT processing to convert the HTML to your target vocabulary.
Or you can open the DOCX file in the Oxygen Archive Browser view, open from it the "document.xml" which contains the main document's contents and then apply a custom XSLT stylesheet to try and convert it to some other XML format.
For certain target XML vocabularies like DITA and Docbook Oxygen has a special feature called "Smart Paste" (based on a set of predefined internal XSLT stylesheets) which can help with the conversion:
http://blog.oxygenxml.com/2016/05/how-t ... -dita.html
Regards,
Radu
One possibility is to save the Word document to HTML from MS Office, then convert the HTML to XHTML (Oxygen has a File->Import feature which can do that) and then use XSLT processing to convert the HTML to your target vocabulary.
Or you can open the DOCX file in the Oxygen Archive Browser view, open from it the "document.xml" which contains the main document's contents and then apply a custom XSLT stylesheet to try and convert it to some other XML format.
For certain target XML vocabularies like DITA and Docbook Oxygen has a special feature called "Smart Paste" (based on a set of predefined internal XSLT stylesheets) which can help with the conversion:
http://blog.oxygenxml.com/2016/05/how-t ... -dita.html
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service