XHTML to DocBook conversion

vincentb
Posts: 2
Joined: Thu Jul 15, 2021 11:28 am

XHTML to DocBook conversion

Post by vincentb »

Hi all,

I'm currently working in a little company which owns an inline shop.
I have written a lot of detailed product sheets in XHTML format for years and each time we have changed our website engine (home made to Prestashop or Magento), for various business reasons, I had to modifiy those product sheets to make them compatible with the new website structure.
So I have decided to transform all our product sheets in DocBook 5 format. All our new product sheets are now written directly in DocBook 5 format using Oxygen tools.

I have written an XSL stylesheet to transform those product sheets in our actual XHTML output format (New Oxatis SaaS engine) and use a transformation scenario to automatically generate my XHTML fragments ; all the outputs are XHTML fragments and not whole pages.

I'm searching now to make the opposite : XHTML fragments to DocBook 5. This scenario will allow me to retrieve existing product sheets written for years in XTML format.

Is there any tool to easily transform fragments of XHTML code to DocBook format automatically ?
It is important to do this with a script or another automatic method (not copy/paste) because we have hundreds maybe thousands product sheets already written.

Thanks for your help
Vincent
Radu
Posts: 8991
Joined: Fri Jul 09, 2004 5:18 pm

Re: XHTML to DocBook conversion

Post by Radu »

Hi Vincent,

We have a batch converter add-on for Oxygen allowing you to transform multiple HTML or XHTML documents to DocBook 4 or 5:
https://www.oxygenxml.com/doc/versions/ ... addon.html

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
vincentb
Posts: 2
Joined: Thu Jul 15, 2021 11:28 am

Re: XHTML to DocBook conversion

Post by vincentb »

Hi Radu,

I have installed the add-on and it works fine !
I have to use an other XSL stylesheet to transform the DocBook output from the add-on to match my own DocBook structure which contains some attributes (xml:id or xml:lang) or which use section/simplesection.

Thank you for your help.

Is it possible to customize the output document ?
I have some attributes (class) in the original XHTML file that could be helpful to retrieve but they don't appear in the output XML file. I know that it is difficult to map XHTML classes into DocBook's attributes but the class attribute is useful to map <div> tags from XHTML to <section> tags in DocBook.

Regards
Vincent
Cosmin Duna
Site Admin
Posts: 120
Joined: Wed Dec 12, 2018 5:33 pm

Re: XHTML to DocBook conversion

Post by Cosmin Duna »

Hi Vincent,

This conversion is based on XSLT, so you can modify some XSLT stylesheets for changing the output. After the add-on is installed, Oxygen saves the add-on's files in this directory: "C:\Users\user_name\AppData\Roaming\com.oxygenxml\extensions\v24.0\plugins\https_www.oxygenxml.com_InstData_Addons_default_updateSite.xml\oxygen-batch-converter-3.1.0".
The stylesheets are located in the "oxygen-batch-converter-3.1.0.jar" jar file from the "lib" directory and you have to open this jar in the "Archive Browser" view from Oxygen to edit them.
The "stylesheets/docbook/xhtml2db4.xsl" file is used in DocBook4 conversion and the "stylesheets/docbook/xhtml2db5.xsl" file in Docbook5. For keeping the "class" attribute in the output, open one of these files and add this template:

Code: Select all

 <xsl:template match="@class">
	 <xsl:copy-of select="."/>
  </xsl:template>
You have to restart the application after modifications. After that, the "class" attributes will be preserved in the output of HTML to Docbook conversion.

Regards,
Cosmin
Cosmin Duna
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply