[oXygen-user] Feature request - Copy-and-paste from Word/Excel/HTML to DITA

Wendell Piez
Fri Jun 20 11:05:09 CDT 2008


Hi,

At 09:13 AM 6/20/2008, Sorin wrote:
>Version 9.3 which will be released in a couple of weeks will include 
>an Archive Browser view that is able to open and browse Word and 
>Excel documents saved in XML format, that is .docx files and .xlsx 
>files. In the Archive Browser view the files that are included in 
>such a Word or Excel document can be opened and edited in Oxygen so 
>migrating the data to a DITA document will be easy: just apply an 
>XSLT stylesheet to the XML file containing the data that must be imported.

I like this approach, as there are several other XML vocabularies 
that might be wanted as targets for upconversion. Nothing against 
DITA, of course, but it makes sense to consider requirements for 
other tag sets as well.

However, those of us who have even glanced at .docx format know that 
it's a ravenous beast of unorthodox tagging practice, for which will 
be a challenge to write stylesheets.

One solution to this problem would entail a generic stylesheet that 
will upconvert .docx into a more regular and proper sort of XML, in 
which (just to mention the most glaring problem) mixed content is 
actually mixed content. Such a plain vanilla word-processing XML 
would make a much more tractable source format for conversion into 
arbitrary targets such as DITA or what have you.

I dare say this stylesheet will be a devil to write, especially if it 
aimed to be comprehensive. All the more reason to solve this problem 
once instead of making everyone solve it on their own.

An alternative (which might be more feasible) might be a library of 
XSLT templates and functions that would help take care of the hard parts.

Cheers,
Wendell


======================================================================
Wendell Piez                            mailto:
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================




More information about the oXygen-user mailing list