Edit online

Migrating MS Office Documents to DITA

Oxygen XML Editor integrates the entire DITA for Publishers plugins suite and provides some possibilities for migrating content from Microsoft Office® (and other Office-type formats) to DITA. There are also possibilities for migrating various other types of formats. For more information, see Migrating Various Document Formats to and from DITA.

Migration from Office-type formats to XML is rarely perfect and manual changes may need to be made to the converted content, but the methods described below should help you find the best approach for your particular case.

Oxygen's Batch Documents Converter Add-on (Multiple Documents)

The Oxygen Batch Documents Converter add-on can be installed in Oxygen XML Editor to provide the ability to convert one or more documents to various formats.

For more details about the main stages of the Word to DITA migration using the Batch Documents Converter add-on, see the following blog post: Migrating MS Word to DITA using Batch Documents Converter.

Note: The Batch Documents Converter add-on is the recommended way to convert one or more Word documents to DITA content.

Smart Paste (Single Document)

  1. Open the document in MS Office (or other similar application), select all the content, and copy it.
  2. Open Oxygen XML Editor and create a new DITA topic.
  3. Paste the selected content in Author mode. The Smart Paste functionality will attempt to convert the content to DITA structure.

HTML to DITA (Single Document)

  1. Save your document as HTML.
  2. Once you have converted it to HTML, you have several possibilities:
    • In Oxygen XML Editor, select File > Import/Convert > HTML File to XHTML to import it as XHTML. Then, open the XHTML in Oxygen XML Editor and use one of the XHTML to DITA transformation scenarios to convert the content to DITA structure.
    • Open the HTML file in any web browser, select all of its content, and copy it. Then, open Oxygen XML Editor, create a new DITA topic, and paste the selected content in Author mode. The Smart Paste functionality will attempt to convert the HTML content to DITA structure.

Word to LibreOffice to DITA (Single Document)

  1. Open the document in the LibreOffice application and save it as DocBook.
  2. Open the DocBook document in Oxygen XML Editor.
  3. Run the built-in DocBook to DITA transformation scenario.
  4. You may need to make some manual adjustments for elements that could not be mapped.

Word to DITA using DITA For Publishers (Single Document)

  1. Save the document in the MS Word DOCX format.
  2. Open it in the Archive Browser view in Oxygen XML Editor and then open the document.xml file contained in the archive.
  3. Run the built-in DOCX DITA transformation scenario. This scenario runs a build file over the DOCX archive and should produce a DITA project that contains a DITA map and multiple topics.
  4. You may need to do some manual reconfiguration to map DOCX styles to DITA content. The XSLT conversion is part of the DITA For Publishers plugin and there is documentation for it available here: http://www.dita4publishers.org/d4p-users-guide/user_docs/d4p-users-guide/word2dita/word2dita-intro.html.

Word to DocBook to DITA (Multiple Documents)

  1. Use a tool to convert the documents to DocBook. For example, Pandoc is a free document converter engine that can convert DOCX documents to DocBook and according to Pandoc's manual, you can specify multiple input files and use wildcards in the commands.
  2. Save the newly converted DocBook documents somewhere in your project.
  3. Perform a batch transformation on all the newly converted DocBook documents:
    1. Select all the DocBook documents in the Project view.
    2. Right-click the selected files and choose Transform > Configure Transformation Scenario(s).
    3. Apply the built-in DocBook to DITA transformation scenario.
  4. You may need to make some manual adjustments in the resulting documents for elements that could not be mapped.

Word to HTML/Markdown to DITA (Multiple Documents)

  1. Use a tool to convert the documents to HTML or Markdown. For example, Pandoc is a free document converter engine that can convert DOCX documents to those formats.
  2. Use Oxygen's Batch Converter add-on to convert the documents to DITA.
  3. You may need to make some manual adjustments in the resulting documents for elements that could not be mapped.

Migrating Excel and Other Types of Spreadsheets to DITA

There are two possibilities for converting Microsoft Excel (or other similar types of documents) to DITA:

Resources