docx to docbook

Questions about XML that are not covered by the other forums should go here.
ramjill
Posts: 16
Joined: Tue Nov 08, 2011 2:14 pm

docx to docbook

Post by ramjill »

hi,

can you tell me how to convert docx to docbook in oxygen?

is this possible?
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

Hello,

There are stylesheets for the DOCX to Docbook conversion in the Docbook XSL package which is included in Oxygen. The procedure is the following:
  • First you have to open the DOCX document in Oxygen, which will open it automatically in the Archive Browser view and also will open automatically the document.xml file from the DOCX document. The content of the DOCX is stored in this document.xml file.
  • Create an XSLT transformation scenario associated with the document.xml file which will apply a sequence of 4 XSLT stylesheets to this file, in this order:
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/wordml2normalise.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/normalise2sections.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/sections2blocks.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/blocks2dbk.xsl


    In the XSL URL text box of the dialog box for creating the scenario you should set the [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/wordml2normalise.xsl stylesheet. You should add the other 3 stylesheets to the scenario using the Additional XSLT stylesheets button.
The result of the transformation, that is the file which you set in the Save As field of the dialog box, should be a Docbook XML document with the same content as the source DOCX one.


Regards,
Sorin
ramjill
Posts: 16
Joined: Tue Nov 08, 2011 2:14 pm

Re: docx to docbook

Post by ramjill »

hi,

i tried like what u said, but after creating and run docx docbook xslt transformation , i got an error...

Severity: error
Description: Cannot apply cascading transformation. Reason: org.xml.sax.SAXParseException: Content is not allowed in prolog.
ramjill
Posts: 16
Joined: Tue Nov 08, 2011 2:14 pm

Re: docx to docbook

Post by ramjill »

hi,

cau you explain with screen shot "how to create xslt transformation".

it will be helpfuk for me.

thanks in advance.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

Hi,
ramjill wrote:i tried like what u said, but after creating and run docx docbook xslt transformation , i got an error...

Severity: error
Description: Cannot apply cascading transformation. Reason: org.xml.sax.SAXParseException: Content is not allowed in prolog.

What is the namespace declaration of the root element from the document.xml file? Is it xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main"? If yes the Docbook stylesheets must be modified first for processing XML documents with this namespace (http://schemas.openxmlformats.org/wordp ... /2006/main). I will submit a request for enhancement to the Docbook XSL package for handling this namespace.

If not please send us a sample DOCX file for reproducing the problem.


Regards,
Sorin
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

ramjill wrote:hi,

cau you explain with screen shot "how to create xslt transformation".

it will be helpfuk for me.

thanks in advance.
In my first reply I included a link to the User Manual topic about creating an XSLT transformation. You can see there some screenshots of the 3 tabs of this dialog box and the explanations for the parameters that can be set in the dialog box.


Regards,
Sorin
ramjill
Posts: 16
Joined: Tue Nov 08, 2011 2:14 pm

Re: docx to docbook

Post by ramjill »

hi,

Yes, The namespace declaration of the root element from the document.xml file is "xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main".

So i have changed the namespace for wordml2normalise.xsl stylesheet is = xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main

is it correct?

after i run this transformation but i got an two errors.

Failed to parse stylesheet,
xpointer attribute must be present when href attribute is absent.


how i can resolve this? am i did any mistake?
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

I think you did not replace the namespace correctly. I replaced the value of the xmlns:w attribute that is declared on the root element of wordml2normalise.xsl and the transformation does not report errors but in the result XML file I get only a list of paragraphs. That means there are more changes that must be applied to the stylesheet for adjusting it to the namespace (and structure) of Word documents. As I said I will submit a request for enhancement to the Docbook XSL developers for handling this namespace correctly in the Docbook XSL stylesheets.


Regards,
Sorin
ramjill
Posts: 16
Joined: Tue Nov 08, 2011 2:14 pm

Re: docx to docbook

Post by ramjill »

hi,

i got this error from blocks2dbk.xsl file after replacing the namespace.

Engine name: Saxon6.5.5
Severity: error
Description: Failed to parse stylesheet


Engine name: Saxon6.5.5
Severity: fatal
Description: xpointer attribute must be present when href attribute is absent.
Start location: 178:21


<xsl:when test='@rnd:style = "d:xinclude"'
xmlns:xi='http://www.w3.org/2001/XInclude'>
<xi:include>
<xsl:attribute name='href'>
<xsl:apply-templates mode='rnd:xinclude'/>
</xsl:attribute>
</xi:include>
</xsl:when>

above tag i got this error...
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

Hi,

I think the changes that must be applied to the Docbook XSL stylesheets are not trivial. If you want to correct yourself the stylesheets please contact the Docbook XSL developers. They are able to help you in this task.


Regards,
Sorin
kec
Posts: 1
Joined: Thu Nov 14, 2013 9:53 pm

Re: docx to docbook

Post by kec »

This is an older post, but a current topic for me. If I try and follow the instructions today with the latest versions of oxygen, I get the same exceptions.

Has anyone gotten the scenario to work with current versions of word (I'm using word for Mac 2011). Does anyone know of the secret handshake required to get it to work?

Thanks,

Keith
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: docx to docbook

Post by sorin_ristache »

Hi Keith,

I think the DocBook XSL stylesheets for the DOCX to DocBook conversion were not improved yet. You can either raise the issue with the DocBook XSL project or improve these stylesheets yourself. In Oxygen the XSLT stylesheets for the DOCX => DocBook conversion are located in the directory:

Code: Select all

[Oxygen-install-dir]/frameworks/docbook/xsl/roundtrip

Regards,
Sorin
Post Reply