Page 1 of 1

Problems with Batch converter

Posted: Thu Jan 16, 2020 5:20 pm
by zuza
converter.zip
(3.99 KiB) Downloaded 223 times
Hello,

Soon we will have to deliver DITA files to customers so that they can use them in their documentation.
Long story short, since there is no way to do a dita2dita transformation that does all the filtering and key replacing, I am trying to do a DITA->(X)HTML->DITA conversion (the map export transformation doesn't resolve the keys/remove comments/etc, so it doesn't help us much).

To convert (X)HTML to DITA, I installed the Batch Converter addon. The first problem I encountered is that after installing in XML Editor 21.1, build 2019120214, there is no Batch Converter submenu will available in the Tools menu nor in the contextual menu of the Project view. The Batch Converter is listed if I go to Manage Add-Ons, showing version 1.1.16.
I got around this by using the XML Editor 21.0, build 2019022207 installation which was luckily still available on my machine and had the Batch Converter 1.1.14 - I updated that to the latest version and it does show up in the Tools menu and contextual menu of the Project view.

There are a few issues with using the Batch Converter to convert html5 to DITA, the main one being the topic title not being included in <title> but in <body>. This is not an issue when converting xhtml to DITA, but there are other issues.

I have attached a zip file with a small test - r_characteristics.dita is the original file, the 2 .html files are obtained using the HTML5 and XHTML transformations, which are then transformed back to DITA using the Batch Converter.

Is there any chance that the issues highlighted in the converted DITA files can be fixed in the very near future so that we can try this roundabout approach to convert DITA to DITA?

It would also be great if a dita2dita transformation would be available, as more and more companies are using DITA there will be increasing demand to deliver DITA source files to customers so that they can reuse the content in their product documentation (I've found a 2y old thread about this on the dita-users group!) - is this something that you might be considering to develop?

Thank you.

Best regards,
Ozana

Re: Problems with Batch converter

Posted: Fri Jan 17, 2020 12:11 pm
by Radu
Hi Ozana,

Please see some answers below:
Long story short, since there is no way to do a dita2dita transformation that does all the filtering and key replacing, I am trying to do a DITA->(X)HTML->DITA conversion (the map export transformation doesn't resolve the keys/remove comments/etc, so it doesn't help us much).
Jarno Elovirta created a DITA normalize plugin which is also part of the DITA OT bundled with Oxygen:

https://github.com/dita-ot/org.dita.normalize

It should be useful for such DITA to DITA conversions with filtering and key resolution but I have not worked much with it
To convert (X)HTML to DITA, I installed the Batch Converter addon. The first problem I encountered is that after installing in XML Editor 21.1, build 2019120214, there is no Batch Converter submenu will available in the Tools menu nor in the contextual menu of the Project view. The Batch Converter is listed if I go to Manage Add-Ons, showing version 1.1.16.
We cannot reproduce the problem on our side, can you try in Oxygen 21.1 to uninstall the batch converter and install it again? Also in the Preferences->"Plugins" page make sure the plugin is checked/enabled.
I got around this by using the XML Editor 21.0, build 2019022207 installation which was luckily still available on my machine and had the Batch Converter 1.1.14 - I updated that to the latest version and it does show up in the Tools menu and contextual menu of the Project view.
Ok.
There are a few issues with using the Batch Converter to convert html5 to DITA, the main one being the topic title not being included in <title> but in <body>. This is not an issue when converting xhtml to DITA, but there are other issues.
I have attached a zip file with a small test - r_characteristics.dita is the original file, the 2 .html files are obtained using the HTML5 and XHTML transformations, which are then transformed back to DITA using the Batch Converter.

Is there any chance that the issues highlighted in the converted DITA files can be fixed in the very near future so that we can try this roundabout approach to convert DITA to DITA?
We can reproduce the problem and I will add an internal issue to look into fixing these problems. Maybe we'll have time to look into this problem in a couple of weeks but I make no guarantee, I will update the thread when a fix becomes available in a new add-on.

Regards,
Radu

Re: Problems with Batch converter

Posted: Fri Jan 17, 2020 1:09 pm
by zuza
Thanks Radu,

I tried the DITA normalize plugin and it does the filtering and brings in the values of the keys, but:
  • the attributes with the filter values are not removed, so it still shows <li otherprops="TCU">Specialised Texture Cache Unit (TCU)</li>
  • the key references are not removed, but instead the value is inserted as text, for example where we had <keyword keyref="var_variant_number"/> now it shows <keyword keyref="var_variant_number">GM9200</keyword>
  • oXygen comments and markup are still in the topics
Thinking about it again, post-processing this to remove the @product, @otherprops and @audience attributes, unwrap the <keyword> elements (not sure yet what to do when <ph> is used instead of <keyword>) and remove the oxygen processing instructions might be the easiest way for the moment.

Best regards,
Ozana

Re: Problems with Batch converter

Posted: Fri Jan 17, 2020 1:21 pm
by Radu
Hi Ozana,

Maybe indeed you can create a custom XSLT stylesheet which does the cleanup and apply it as an XML refactoring action in Oxygen:

https://www.oxygenxml.com/doc/versions/ ... tools.html

Regards,
Radu

Re: Problems with Batch converter

Posted: Wed May 20, 2020 4:13 pm
by Cosmin Duna
Hello Ozana,

Just wanted to let you know that we released a new version of Oxygen Batch Converter add-on and it improves the "HTML to DITA" conversion.

Best regards,
Cosmin