Page 1 of 1

docx to docbook

Posted: Sun Nov 13, 2011 9:53 am
by ramjill
hi,

can you tell me how to convert docx to docbook in oxygen?

is this possible?

Re: docx to docbook

Posted: Mon Nov 14, 2011 12:00 pm
by sorin_ristache
Hello,

There are stylesheets for the DOCX to Docbook conversion in the Docbook XSL package which is included in Oxygen. The procedure is the following:
  • First you have to open the DOCX document in Oxygen, which will open it automatically in the Archive Browser view and also will open automatically the document.xml file from the DOCX document. The content of the DOCX is stored in this document.xml file.
  • Create an XSLT transformation scenario associated with the document.xml file which will apply a sequence of 4 XSLT stylesheets to this file, in this order:
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/wordml2normalise.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/normalise2sections.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/sections2blocks.xsl
    • [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/blocks2dbk.xsl


    In the XSL URL text box of the dialog box for creating the scenario you should set the [Oxygen-folder]/frameworks/docbook/xsl/roundtrip/wordml2normalise.xsl stylesheet. You should add the other 3 stylesheets to the scenario using the Additional XSLT stylesheets button.
The result of the transformation, that is the file which you set in the Save As field of the dialog box, should be a Docbook XML document with the same content as the source DOCX one.


Regards,
Sorin

Re: docx to docbook

Posted: Mon Nov 14, 2011 1:48 pm
by ramjill
hi,

i tried like what u said, but after creating and run docx docbook xslt transformation , i got an error...

Severity: error
Description: Cannot apply cascading transformation. Reason: org.xml.sax.SAXParseException: Content is not allowed in prolog.

Re: docx to docbook

Posted: Mon Nov 14, 2011 2:10 pm
by ramjill
hi,

cau you explain with screen shot "how to create xslt transformation".

it will be helpfuk for me.

thanks in advance.

Re: docx to docbook

Posted: Mon Nov 14, 2011 2:20 pm
by sorin_ristache
Hi,
ramjill wrote:i tried like what u said, but after creating and run docx docbook xslt transformation , i got an error...

Severity: error
Description: Cannot apply cascading transformation. Reason: org.xml.sax.SAXParseException: Content is not allowed in prolog.

What is the namespace declaration of the root element from the document.xml file? Is it xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main"? If yes the Docbook stylesheets must be modified first for processing XML documents with this namespace (http://schemas.openxmlformats.org/wordp ... /2006/main). I will submit a request for enhancement to the Docbook XSL package for handling this namespace.

If not please send us a sample DOCX file for reproducing the problem.


Regards,
Sorin

Re: docx to docbook

Posted: Mon Nov 14, 2011 2:27 pm
by sorin_ristache
ramjill wrote:hi,

cau you explain with screen shot "how to create xslt transformation".

it will be helpfuk for me.

thanks in advance.
In my first reply I included a link to the User Manual topic about creating an XSLT transformation. You can see there some screenshots of the 3 tabs of this dialog box and the explanations for the parameters that can be set in the dialog box.


Regards,
Sorin

Re: docx to docbook

Posted: Mon Nov 14, 2011 2:48 pm
by ramjill
hi,

Yes, The namespace declaration of the root element from the document.xml file is "xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main".

So i have changed the namespace for wordml2normalise.xsl stylesheet is = xmlns:w="http://schemas.openxmlformats.org/wordp ... /2006/main

is it correct?

after i run this transformation but i got an two errors.

Failed to parse stylesheet,
xpointer attribute must be present when href attribute is absent.


how i can resolve this? am i did any mistake?

Re: docx to docbook

Posted: Mon Nov 14, 2011 3:30 pm
by sorin_ristache
I think you did not replace the namespace correctly. I replaced the value of the xmlns:w attribute that is declared on the root element of wordml2normalise.xsl and the transformation does not report errors but in the result XML file I get only a list of paragraphs. That means there are more changes that must be applied to the stylesheet for adjusting it to the namespace (and structure) of Word documents. As I said I will submit a request for enhancement to the Docbook XSL developers for handling this namespace correctly in the Docbook XSL stylesheets.


Regards,
Sorin

Re: docx to docbook

Posted: Mon Nov 14, 2011 4:03 pm
by ramjill
hi,

i got this error from blocks2dbk.xsl file after replacing the namespace.

Engine name: Saxon6.5.5
Severity: error
Description: Failed to parse stylesheet


Engine name: Saxon6.5.5
Severity: fatal
Description: xpointer attribute must be present when href attribute is absent.
Start location: 178:21


<xsl:when test='@rnd:style = "d:xinclude"'
xmlns:xi='http://www.w3.org/2001/XInclude'>
<xi:include>
<xsl:attribute name='href'>
<xsl:apply-templates mode='rnd:xinclude'/>
</xsl:attribute>
</xi:include>
</xsl:when>

above tag i got this error...

Re: docx to docbook

Posted: Mon Nov 14, 2011 4:30 pm
by sorin_ristache
Hi,

I think the changes that must be applied to the Docbook XSL stylesheets are not trivial. If you want to correct yourself the stylesheets please contact the Docbook XSL developers. They are able to help you in this task.


Regards,
Sorin

Re: docx to docbook

Posted: Thu Nov 14, 2013 9:57 pm
by kec
This is an older post, but a current topic for me. If I try and follow the instructions today with the latest versions of oxygen, I get the same exceptions.

Has anyone gotten the scenario to work with current versions of word (I'm using word for Mac 2011). Does anyone know of the secret handshake required to get it to work?

Thanks,

Keith

Re: docx to docbook

Posted: Fri Nov 15, 2013 11:50 am
by sorin_ristache
Hi Keith,

I think the DocBook XSL stylesheets for the DOCX to DocBook conversion were not improved yet. You can either raise the issue with the DocBook XSL project or improve these stylesheets yourself. In Oxygen the XSLT stylesheets for the DOCX => DocBook conversion are located in the directory:

Code: Select all

[Oxygen-install-dir]/frameworks/docbook/xsl/roundtrip

Regards,
Sorin