[oXygen-user] [oXygen XML Editor Blog] - Possibilities to obtain PDF from DITA

Tue Nov 17 22:14:15 CST 2015

On 17/11/2015 4:30 pm, oXygen XML Editor Blog wrote:
> oXygen XML Editor Blog
> 
> ///////////////////////////////////////////
> Possibilities to obtain PDF from DITA
> 
> Posted: 16 Nov 2015 01:07 PM PST
> http://feedproxy.google.com/~r/AboutOxygenXmlEditor/~3/sXZMCVP56Pc/possibilities-to-obtain-pdf-from-dita.html?utm_source=feedburner&utm_medium=email

[Note: this began as a comment on the blog, but Blogger hates me and
may have eaten it, so here it is.]

Another reason is that Apache's FOP suffers all sorts of bugs (as
we've discussed before) and clearly the timeframe on resolving these
hasn't been good enough for the companies that need it, so the market
responded.  From the looks of things a lot of the solutions are built
on .NET, so there's still room for more entrants in the market if they
can meet the needs of the non-Microsoft market.

Also, there are other methods to get what people want (and no one
really cares what the method is if it works and can be entirely
automated).

DITA to [X]HTML + CSS then generate the PDF using wkhtmltopdf.

The advantage is it just works.  The disadvantage is that the webkit
based solution Google used is built on Qt 4.8 and, like many browsers,
it does not render the text-justify style in CSS.  If you want
justified text in print, you'll need another solution.

Since LibreOffice can utilise XSL files and take instructions via the
command line, it may be quite possible to use that in conjunction with
any OpenDocument template (.ott files), though I haven't tried it yet.
It's use of XSL files require them being loaded through the GUI first.

Alternatively there may be an avenue to ignore PDF entirely by
implementing Microsoft's Open XML Paper Specification (.oxps files),
but it probably requires implementing vast swathes of the .NET
framework in order to provide XAML support in a platform independent
way (a shame since OpenXPS looked interesting and having seen what PDF
really is underneath an alternative is always appealing).

Another option once existing transformations to HTML or DOCX are
included is to tap pandoc again and use its PDF generation.  That
utilises LaTeX, however, and looks quite different to most other PDFs.
Controlling that layout requires a detailed understanding of LaTeX's syntax.

I have yet to see any solution which utilises Apple's Quartz Core
framework on which OS X is built.  Which is kind of odd given how
extensive XML is in OS X configuration and that PDF is one of the true
essential foundations of Quartz.  It's quite possible that everything
is already in place, but Apple just hasn't told anyone because that's
their secret.  They can be a bit weird sometimes.

I haven't been too worried about PDF or print so far, but if that
changes and spending thousands of dollars isn't an option (and it
isn't).  I suspect I'd end up finding the best way to feed documents
back to LibreOffice and let it print them or produce the PDF,
depending on which path required more fiddling.  It might even be
possible to get LibreOffice to process the output of the
com.oxygenxml.pdf.css plugin.  If that works then you've got a free
framework that runs on OS X, Linux, Windows, Solaris, BSD and other
bits and pieces.  Admittedly it's a 660MB framework, but that's still
smaller than a full LaTeX installation.

... oh damn, now I'm curious ... [at this point some time passed] ...

Okay, the quickest and easiest method is to use LibreOffice with the
built-in DITA Map to ODF plugin and then use this command on the
generated file:

[/path/to/]soffice --headless --convert-to pdf filename.odt

Change the path to the soffice binary to whatever is relevant for your
platform.  For OS X users that should be:

/Applications/LibreOffice.app/Contents/MacOS/soffice --headless
--convert-to pdf filename.odt

It may be preferable to adjust the OpenDocument template used with the
generated file first and assuming the DITA-OT plugin is already using
an ODF template it shouldn't be too difficult to make a duplicate
scenario allowing alternative templates to be specified.  If that is
the case, that's everything.

Some quick grepping, however, indicates that the default ODF output is
entirely XSL driven, so chances are modifications will involve
adjusting the ./xsl/xslodt/dita2odtstyles.xsl file in the plugin.
This may be an issue if, for example, you think using Courier for
footnotes is just annoying when all the rest of the text uses a
variable width sans serif font.

If XSL editing is to be avoided at all costs then convert to DOCX,
Docbook or a single page HTML file (if you leave them separate you
will have a lot of typing to do) and use that with pandoc as pandoc
can accept an ODF template on the command line with the --reference
option.  Using this will completely override existing formatting so if
you just want to adjust the current template's chosen fonts in the
DITA-OT plugin via the Pandoc method, use LibreOffice to create a new
template based on the style of a document generated with the DITA-OT
plugin (it's possible within LO to import styles from any file,
whether it's a template or not, which can be used to make a new
template based on it - which means if you want to nick the style used
in the LibreOffice documentation then you can).

Since LibreOffice can't use an ODF template via the headless command
(very annoying, but can't be helped yet) the prize here goes to the
first person or group who converts the existing DITA Map to ODF plugin
to utilise an ODF template or, more likely, to create an ODF to DITA
transformation which includes an option to generate XSL files to turn
replace the current style XSL files (so ODF to XSL via an XSLT) since
there's no real difference between an ODT and an OTT other than how
the office suite(s) treat them.

The pandoc variant will extend this to work with DOCX and its
template files too, but I only use DOCX as a stepping stone between
various formats (it appears to be the "gateway drug" for all file
formats, at least until DITA can cover everything).

Anyway, the important point is that it does work and will work on any
system with either LibreOffice or OpenOffice.org and possibly with
Abiword (I haven't used that one, so can't confirm).  For my part I'm
trying to avoid PDF where possible (in part because I started reading
the spec earlier this year for reasons better left to the mists of
time, so I'd just go for the quick and easy convert to ODT and then
adjust the styles before printing or exporting to PDF.  Either that or
spend a day or two fiddling to get the builtin plugin to use better
fonts and then leave it as is.  It probably needs some tweaking
anyway, though since it didn't include things like a page break
between the cover page and the copyright page which was a little
surprising, no doubt the more time spent on it the less suckage it
will spout (to a limited extent with PDF as the end point).

Regards,
Ben

P.S.  You can cheat by generating an ePub and then converting that to
      PDF with the consumer's choice for file conversion: Calibre.
      Just don't rely on it because there are zero guarantees, it was
      never intended for anything other than managing personal ebook
      collections and shifting between devices (which is why I turned
      up in the first place, well due to that and Sigil breaking ePub
      3.0 files).  I'd use it to test drafts and the like, but never
      in production.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 630 bytes
Desc: OpenPGP digital signature
URL: <http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20151118/e984cd4c/attachment.sig>