Word to DITA- image names and alt text

Here should go questions about transforming XML with XSLT and FOP.
bmayer
Posts: 22
Joined: Mon Jul 09, 2012 5:30 pm

Word to DITA- image names and alt text

Post by bmayer »

Hi,

Hopefully I have the correct forum group. I'm working on converting a Word .docx document into XML. I followed the instructions by Radu at http://www.oxygenxml.com/forum/post28564.html#p28564 which have been very helpful. I have two questions about getting image info converted correctly:

1) After opening my Word document and viewing the document.xml file via the Archive Browser I see the following entry for an image and the alt text that I input into the Word doc:
<wp:docPr id="249" name="Picture 1" descr="Content Rating Description" title="Content Rating title"/>

After running the DOCX DITA transform the topic has:
<image href="media/image1.png"><alt>media/image1.png</alt></image>
So it doesn't keep the alt title that I had.

I noticed C:\Program Files (x86)\Oxygen XML Editor 15\frameworks\dita\DITA-OT\plugins\net.sourceforge.dita4publishers.word2dita\xsl\simple2dita.xsl file has this entry which is what I assume I'll need to update, I just don't know what to update it to:

<image href="{$imageUrl}">
<alt><xsl:sequence select="$imageUrl"/></alt>
</image>

Anyone know what I should change this to?

2) Is there a way to get more meaningful names assigned to pictures? In the media folder they are all called image1.png etc. I looked in Word to figure out how to name them but couldn't find anything. We can just manually update them in the folder but was curious if someone had found a way in Word.


Thanks,
Belinda
Radu
Posts: 9057
Joined: Fri Jul 09, 2004 5:18 pm

Re: Word to DITA- image names and alt text

Post by Radu »

Hi Belinda,

I am not very familiar with the Word to DITA processing so my advice would be for you to write these questions on the yahoo Group DITA Users List, Eliot Kimber, the expert who developed the word to DITA plugins is registered on it and may help further.

From what I looked, in the XSL:

OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.word2dita/xsl/wordml2simple.xsl

there are several places where <image> tags are generated in a special namespace by matching certain MS Office elements.

Then in the XSLT you found simple2dita.xsl those image elements generated in the first step are further processed to produce the DITA image tags.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
SathyaOX
Posts: 29
Joined: Wed Oct 26, 2016 8:19 pm
Location: India

Re: Word to DITA- image names and alt text

Post by SathyaOX »

Hi Radu, do we have a solution for this? Images should take Alt text from word doc as their names after conversion. Or, is there a way to rename the images and they get updated in the xml files?
-Sathya
Cosmin Duna
Site Admin
Posts: 120
Joined: Wed Dec 12, 2018 5:33 pm

Re: Word to DITA- image names and alt text

Post by Cosmin Duna »

Hi Sathya,
This is an old discussion thread. In the meantime, we created an addon named Batch Documents Converter that contains Word to DITA conversion.
Here you have more information about the addon: https://www.oxygenxml.com/doc/versions/ ... addon.html

This conversion should preserve alternate text on images.
As I know, Word doesn't keep the original name of the images in the internal structure. So, you have to use the "Rename resource" refactoring action after the conversion for renaming and updating references to them. See this documentation topic for more information: https://www.oxygenxml.com/doc/versions/ ... _resources

Also, we have a webinar where we present various migrations including Word to DITA and refactoring actions that can be applied after conversion: https://www.oxygenxml.com/events/2021/w ... oring.html

Best regards,
Cosmin
Cosmin Duna
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply