Expected Format of DOCX Files

Here should go questions about transforming XML with XSLT and FOP.
Jamil
Posts: 97
Joined: Thu Oct 23, 2008 6:29 am

Expected Format of DOCX Files

Post by Jamil »

I have been attempting to get MS Word DOCX files to DITA transformations to work, but I have been unsuccessful so far. When I open a DOCX file in Oxygen 15.1, this is the failure I see when executing the transform scenario:

Code: Select all

     [xslt]  - [ERROR] The first block in the Word document must be mapped to the root map or topic title.
[xslt] First para is style Normal, mapped as <style xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:public:dita4publishers.org:namespaces:word2dita:style2tagmap" styleName="Normal" tagName="p" topicZone="body" level="1"/>
Based on this error, it looks like the first line of the Word document must have the Title style. Therefore, I delete both the out and temp folders located in the Oxygen project folder, edit the Word document by adding a new line set to the style of title: Test.

I then save the DOCX file in Word followed by reopening Oxygen to try again. When I retry the transformation, I now see this error:

Code: Select all

     [xslt]  + [DEBUG] generateTopicrefs: Starting, level=1
[xslt] C:\Program Files\Oxygen XML Editor 15\frameworks\dita\DITA-OT\plugins\net.sourceforge.dita4publishers.word2dita\xsl\simple2dita.xsl:554: Fatal Error! An empty sequence is not allowed as the value of variable $firstP
It looks like there is an expected format that the Word document must be in in order for the DITA transformation to be successful. Is there any guide to what this expected format must be? I am currently only having issues with the DITA transformation. The DOCX TEI P5 transformation works without issue.

Thanks.
Jamil
Posts: 97
Joined: Thu Oct 23, 2008 6:29 am

Re: Expected Format of DOCX Files

Post by Jamil »

I played around with this a bit, and I got it to work successfully.

In case anyone else encounters this issue, the Word document requires the following to have the DITA XSL transformations work as is with no modification:
  • A document title using the built-in title style
  • A table of contents using Word's TOC feature; note that this will require heading styles to identify TOC items
  • paragraphs of text using Word's normal style
Jamil
Posts: 97
Joined: Thu Oct 23, 2008 6:29 am

Re: Expected Format of DOCX Files

Post by Jamil »

One final comment on this--

If the end result transformation has incorrect links, this is most likely due to the Word document containing extra spacing. I found that when I indent paragraphs and add extra blank lines to the original Word document, things break during the final transformation. When I remove the indentation and extra spacing, it works.

I always have the option of modifying the css in the output to add the spacing back in, so this is not a big deal to me.
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: Expected Format of DOCX Files

Post by Radu »

Hi Jamil,

Thanks for updating the thread with your solution.
The DOCX to DITA plugins used in the Oxygen bundled DITA OT are developed by the DITA for Publishers project.
Please read this for more details:

http://www.oxygenxml.com/forum/post28506.html#p28506

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply