Migrating MS Office Documents to DITA

Oxygen XML Author plugin integrates the entire DITA for Publishers plugins suite and includes some helpful tools that allows you to migrate content from Microsoft Office® Word (and other similar types of documents) to DITA. Migration from proprietary formats to XML is rarely perfect and manual changes may need to be made to the converted content, but the methods described below should help you find the best approach for your particular case:

Method 1

  1. Open the document in MS Office (or other similar application), select all the content, and copy it.
  2. Open Oxygen XML Author plugin and create a new DITA topic.
  3. Paste the selected content in Author mode. The Smart Paste functionality will attempt to convert the content to DITA.

Method 2

  1. Save your document as HTML.
  2. Once you have converted it to HTML, you have two possibilities:
    • In Oxygen XML Author plugin, select File > Import > HTML File to import it as XHTML. Then, open the XHTML in Oxygen XML Author plugin and use one of the XHTML to DITA Transformation Scenarios to convert the content to DITA.
    • Open the HTML file in any Web browser, select all of its content, and copy it. Then, open Oxygen XML Author plugin, create a new DITA topic and paste the selected content in Author mode. The Smart Paste functionality will attempt to convert the HTML content to DITA.

Method 3

  1. Open the document in the free Libre Office application and save it as DocBook.
  2. Open the DocBook document in Oxygen XML Author plugin.
  3. Run the predefined transformation scenario called DocBook to DITA.
  4. You may need to make some manual adjustments for elements that couldn't be mapped.

Method 4

  1. If the document is in the MS Word DOCX format, you can open it in the Archive Browser view in Oxygen XML Author plugin and then open the document.xml file contained in the archive.
  2. Run the predefined transformation scenario called DOCX DITA. This scenario runs a build file over the DOCX archive and should produce a DITA project that contains a DITA map and multiple topics.
  3. You may need to do some manual reconfiguration to map DOCX styles to DITA content.