Edit online

Batch Documents Converter Add-on

Oxygen XML Editor offers an add-on that contributes actions in the following submenus:

  • Batch Documents Converter submenu located in the Tools menu and the contextual menu of resources in the Project view.
  • Additional conversions submenu located in File > Import/Convert.
  • Import submenu located in the Append child, Insert Before, and Insert After submenus from the contextual menu of the DITA Maps Manager view.
The first time you invoke any of these actions, Oxygen XML Editor will ask you if you want to install it and offer a wizard to help with the installation process. Once installed, you need to restart Oxygen XML Editor and those same actions will then contain the list of available conversions. Selecting an action from the submenu will open a dialog box where you can configure the options for the corresponding conversion. You can batch convert between the following formats:
  • HTML to XHTML
  • HTML to DITA
  • HTML to DocBook4
  • HTML to DocBook5
  • Markdown to XHTML
  • Markdown to DITA
  • Markdown to DocBook4
  • Markdown to DocBook5
  • Word (.doc or .docx) to XHTML
  • Word (.doc or .docx) to DITA
  • Word (.doc or .docx) to DocBook4
  • Word (.doc or .docx) to DocBook5
  • Excel to DITA
  • Confluence to DITA
  • DocBook to DITA
  • JSON to XML
  • XML to JSON
  • JSON to YAML
  • YAML to JSON

When actions are invoked from the contextual menu of the DITA Maps Manager view, the resulting documents from the conversion are automatically inserted in the map as follows:

  • Actions from Append child inserts map nodes as children of the currently selected node.
  • Actions from Insert Before inserts map nodes as siblings of the currently selected node, above the current node in the map.
  • Actions from Insert After inserts map nodes as siblings of the currently selected node, below the current node in the map.

Quick Installation

You can drag the following Install button and drop it into the main editor in Oxygen (version 24.1 or newer) to quickly initiate the installation process:

Install

Manual Installation

To manually install the Batch Documents Converter add-on:

  1. Go to Help > Install new add-ons to open an add-on selection dialog box.
  2. Enter or paste https://www.oxygenxml.com/InstData/Addons/default/updateSite.xml in the Show add-ons from field or select it from the drop-down menu.
  3. Select the Batch Documents Converter add-on and click Next.
  4. Read the end-user license agreement. Then select the I accept all terms of the end-user license agreement option and click Finish.
  5. Restart the application.

Result: A Batch Documents Converter submenu will now be available in the Tools menu and in the contextual menu. This submenu will contain a list of the various types of available conversions. Selecting one of the types of conversions will open a dialog box where you can configure options for the conversion.

The add-on can also be installed using the following alternative installation procedure:

  1. Go to https://www.oxygenxml.com/InstData/Addons/default/com/oxygenxml/oxygen-batch-converter-addon/, open the latest version's directory and download the oxygen-batch-converter-addon-{version}-plugin.jar file.
  2. Unzip it inside {oXygenInstallDir}/plugins. Make sure you don't create any intermediate folders. After unzipping the archive, the file system should look like this: {oXygenInstallDir}/plugins/oxygen-batch-converter-addon-x.y.z, and inside this folder, there should be a plugin.xml file.

Configuration

Options for configuring the conversions can be found in the preferences page of the add-on (Options > Preferences > Plugins > Batch Documents Converter) or in the conversion dialog box.

Conversions from Word (Word Styles Mapping)

The conversions from MS Word work best if you only use the MS Word styles to semantically mark up your document. It is important that sections from the Word document are well defined using the heading styles.

Use the Word styles mapping option from the Batch Documents Converter preferences page to configure any of the types of Word conversions (Word to HTML, Word to DITA, Word to DocBook4, and Word to DocBook5) by setting a mapping between Word elements and styles to the corresponding HTML element.

If the Word document contains paragraphs formatted with custom styles, they have to be set in the Word styles mapping configuration. Those that are not set will be converted into simple paragraphs. To add custom styles, you can use the default configuration from the table. For example, if you use a custom Word style named Document Title, you can map this to the HTML "h1" element:

Word element Word style HTML elements
p Document Title h1:fresh

The resulting 'h1' element will be transformed into the corresponding element when converting to DITA, DocBook 4, and DocBook 5.

The Word styles mapping table contains the following columns:

Word element

This column allows one of the following Word elements:

  • p - Word paragraph
  • r - Word run
  • b - bold text
  • i - italicized text
  • u - underlined text
  • strike - strikethrough text
  • table - table
  • p:unorderd-list(x) - unordered list (where 'x' is the nesting level of the list)
  • p:orderd-list(x) - ordered list (where 'x' is the nesting level of the list)
Word style

This column can be used to map a paragraph, run, or table with a specific style (referenced by name).

Styles can also be referenced by style ID. This is the ID used internally in the .docx file. To map a paragraph or run with a specific style ID, append a dot followed by the style ID in the Word element column (for example: p.Heading1).
HTML elements

This column can be used to map the resulting HTML elements. It allows a single element or multiple nested elements.

The nested elements can be declared by using the '>' character (for example: ul > li).

The class attribute can be specified on the resulting HTML elements by appending a dot followed by the class value, after the element (for example: p.myClass).

When converting Word to DITA, these class attributes are automatically converted to outputclass attributes. This may be useful if you want to apply extra processing on the resulting DITA document using a custom XLS stylesheet.

The :fresh syntax can be used to create new elements. If it is not used, the converter will try to reuse the element and close it only when it is necessary.

For example, if the following configuration is set:

p Heading 1 h1

When the converter finds consecutive Word paragraphs with the style name Heading 1, these will be converted into a single h1element that contains the text appended from all of the Word paragraphs.

If h1:fresh is set in the last column, the converter will create separate h1 elements.

To ignore elements, the '!' character can be added in the HTML elements column.

The Export button can be used to export the word styles configuration to an XML file. This exported file can be used to configure the MS Word Dynamic Conversion from Oxygen XML Editor by copying the file in the DITA-OT plugin directory: [OXYGEN_INSTALL_DIR]/frameworks/dita/DITA-OT3.x/plugins/com.oxygenxml.dynamic.resources.converter.

The Import button allows you to import the word styles configuration from an exported XML file.

Note: The Word styles mapping configuration is applied only for the newer version of MS Word files formatted in the Microsoft Office Open XML (DOCX) format.
Maximum Heading Level for Creating Topics
The Maximum heading level for creating topics option from the Batch Documents Converter preferences page allows you to set a maximum heading level that the converter will process as DITA topics. The headings with a higher level will be converted to section elements.

When the output is a DITA topic, this option sets the maximum heading level that will be converted as a nested topic in the document.

When the output is a DITA map, this option sets the maximum heading level that will be extracted as a DITA topic file and referenced in the DITA map hierarchy.

Note: This option only applies to the HTML to DITA and Word to DITA conversions.
Word to DITA

The Create DITA maps from Word documents containing multiple headings option from the conversion dialog box allows you to decide whether the output will be a DITA map or a DITA topic. When this option is selected, the sections from your Word document marked by titles or headings will be separated into individual DITA topics and referenced in a DITA map. If the word document does not contain multiple sections, the output will be a single topic. When this option is not selected, the output will be a topic with nested topics and sections according to the number of titles and headings from the Word document.

Markdown to DITA

The Create DITA maps from Markdown documents containing multiple headings option from the conversion dialog box allows you to decide whether the output will be a DITA map or a DITA topic. When this option is selected, all headings from your Markdown document will be separated into individual DITA topics and referenced in a DITA map. If the Markdown document does not contain multiple headings, the output will be a single topic. When this option is not selected, the output will be a topic with nested topics or sections according to the number of headings from the document.

The Create short description elements option from the conversion dialog box allows you to decide whether or not the shortdesc elements are created in the output DITA document. When this option is selected, the first paragraph before the headings from the Markdown document will be converted into DITA short description elements. When this option is not selected, the output will not contain the short description element.

HTML to DITA

The Create DITA maps from HTML documents containing multiple headings option from the conversion dialog box allows you to decide whether the output will be a DITA map or a DITA topic. When this option is selected, the headings from your HTML document will be separated into individual DITA topics and referenced in a DITA map. If the HTML document does not contain multiple sections, the output will be a single topic. When this option is not selected, the output will be a topic with nested topics or sections according to the number of headings from the document.

Confluence to DITA
The Confluence to DITA conversion processes the HTML content generated by the Atlassian® Confluence (see https://www.atlassian.com/software/confluence) export process. To export Confluence content to HTML, log in to your Atlassian® Confluence account and navigate to the specific space that you want to export. Then go to Space Settings > Export space and choose to export it as HTML. The resulting index.html file must be provided in the Input files list from the conversion dialog box.
DocBook to DITA
The Create DITA maps from DocBook documents containing multiple sections option from the conversion dialog box allows you to decide whether the output will be a DITA map or a DITA topic. When this option is selected, the sections from your DocBook document will be converted into individual DITA topics and referenced in a DITA map. When this option is not selected, the output will be a single topic with nested topics.

Resources

For more information about the Batch Converter add-on, as well as details regarding other popular add-ons that extend the functionality of Oxygen XML Editor, watch the following webinars/presentations: