[oXygen-user] [oXygen XML Editor Blog] - Possibilities to obtain PDF from DITA


Wed Nov 18 01:50:01 CST 2015


 

Hi, 

I have no idea on what DITA is, but to generate PDF from XHTML+CSS 2.1,
xhtmlrenderer (aka Flying Saucer) works perfectly. The project is quite
inactive, as all 2.1 CSS fonctionnalities were implemented, and no bug
was found since, but it is still reliable. It relies on iText to produce
PDF. 

In Java, it is very easy to chain XSL transformation(s) and PDF
creation. So a Dita-to-xhtml CSS will do the job... 

Project's home is here : https://code.google.com/p/flying-saucer/ but
code is hosted at github for long. 

Hope this helps... 

Christophe 

Le 2015-11-18 08:30, Radu Coravu a écrit : 

> Hi Ben,
> 
> I'm not sure about the blog post, just logging in Google and posting a comment should have been enough.
> 
>> Another reason is that Apache's FOP suffers all sorts of bugs (as
>> we've discussed before) and clearly the timeframe on resolving these
>> hasn't been good enough for the companies that need it, so the market
>> responded.  From the looks of things a lot of the solutions are built
>> on .NET, so there's still room for more entrants in the market if they
>> can meet the needs of the non-Microsoft market.
> At the same time it might work for most projects. It may also depend on the level of customization you want to bring it to.
> I would consider a PDF printing solution complete if:
> 
> 1) It generates a table of contents.
> 2) It generates an index page.
> 3) It properly generates links between pages.
> 4) It presents images.
> 5) It allows for extra customization.
> 6) Ideally it is multi platform.
> 
>> DITA to [X]HTML + CSS then generate the PDF using wkhtmltopdf.
>> 
>> The advantage is it just works.  The disadvantage is that the webkit
>> based solution Google used is built on Qt 4.8 and, like many browsers,
>> it does not render the text-justify style in CSS.  If you want
>> justified text in print, you'll need another solution.
> Yes, so XHTML to PDF would be a solution. Not sure how it would handle (1), (2) from the list above.
> 
>> Since LibreOffice can utilise XSL files and take instructions via the
>> command line, it may be quite possible to use that in conjunction with
>> any OpenDocument template (.ott files), though I haven't tried it yet.
>> It's use of XSL files require them being loaded through the GUI first.
> This might also work, also haven't tried this myself.
> 
>> Alternatively there may be an avenue to ignore PDF entirely by
>> implementing Microsoft's Open XML Paper Specification (.oxps files),
>> but it probably requires implementing vast swathes of the .NET
>> framework in order to provide XAML support in a platform independent
>> way (a shame since OpenXPS looked interesting and having seen what PDF
>> really is underneath an alternative is always appealing).
> People want the PDFs not only for print but also for web.
> PDF has become ubiquitous in tech docs although I doubt most of our end users actually use that to search for information.
> But most companies want to obtain it possibly in some (most) cases because of inertia
> 
>> Another option once existing transformations to HTML or DOCX are
>> included is to tap pandoc again and use its PDF generation.  That
>> utilises LaTeX, however, and looks quite different to most other PDFs.
>> Controlling that layout requires a detailed understanding of LaTeX's syntax.
> One of our German uses had the idea of developing a plugin for converting DITA to Latex.
> Maybe this is also a path which would need to be investigated.
> 
>> I have yet to see any solution which utilises Apple's Quartz Core
>> framework on which OS X is built.  Which is kind of odd given how
>> extensive XML is in OS X configuration and that PDF is one of the true
>> essential foundations of Quartz.  It's quite possible that everything
>> is already in place, but Apple just hasn't told anyone because that's
>> their secret.  They can be a bit weird sometimes.
> Not multi platform :)
> 
>> I haven't been too worried about PDF or print so far, but if that
>> changes and spending thousands of dollars isn't an option (and it
>> isn't).  I suspect I'd end up finding the best way to feed documents
>> back to LibreOffice and let it print them or produce the PDF,
>> depending on which path required more fiddling.  It might even be
>> possible to get LibreOffice to process the output of the
>> com.oxygenxml.pdf.css plugin.  If that works then you've got a free
>> framework that runs on OS X, Linux, Windows, Solaris, BSD and other
>> bits and pieces.  Admittedly it's a 660MB framework, but that's still
>> smaller than a full LaTeX installation.
> One problem with ODT is that I do not know how much this support will be continually kept and improved in the DITA OT output.
> I'm not sure if many people are interested in this format. The main DITA OT developers are not. 
> From the discussions at the DITA OT Day, it might be possible that at some point in the future some plugins (excepting XHTML and PDF outputs) will be extracted to separate GitHub projects.
> But of course the DITA OT is open source and anybody can contribute. 
> 
> Regards,
> Radu
> 
> Radu Coravu
> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
> 
> On 18/11/15 05:14, Ben McGinnes wrote: 
> 
> On 17/11/2015 4:30 pm, oXygen XML Editor Blog wrote:
> 
> oXygen XML Editor Blog
> 
> ///////////////////////////////////////////
> Possibilities to obtain PDF from DITA
> 
> Posted: 16 Nov 2015 01:07 PM PST
> http://feedproxy.google.com/~r/AboutOxygenXmlEditor/~3/sXZMCVP56Pc/possibilities-to-obtain-pdf-from-dita.html?utm_source=feedburner&utm_medium=email
> 
> [Note: this began as a comment on the blog, but Blogger hates me and
> may have eaten it, so here it is.]
> 
> Another reason is that Apache's FOP suffers all sorts of bugs (as
> we've discussed before) and clearly the timeframe on resolving these
> hasn't been good enough for the companies that need it, so the market
> responded.  From the looks of things a lot of the solutions are built
> on .NET, so there's still room for more entrants in the market if they
> can meet the needs of the non-Microsoft market.
> 
> Also, there are other methods to get what people want (and no one
> really cares what the method is if it works and can be entirely
> automated).
> 
> DITA to [X]HTML + CSS then generate the PDF using wkhtmltopdf.
> 
> The advantage is it just works.  The disadvantage is that the webkit
> based solution Google used is built on Qt 4.8 and, like many browsers,
> it does not render the text-justify style in CSS.  If you want
> justified text in print, you'll need another solution.
> 
> Since LibreOffice can utilise XSL files and take instructions via the
> command line, it may be quite possible to use that in conjunction with
> any OpenDocument template (.ott files), though I haven't tried it yet.
> It's use of XSL files require them being loaded through the GUI first.
> 
> Alternatively there may be an avenue to ignore PDF entirely by
> implementing Microsoft's Open XML Paper Specification (.oxps files),
> but it probably requires implementing vast swathes of the .NET
> framework in order to provide XAML support in a platform independent
> way (a shame since OpenXPS looked interesting and having seen what PDF
> really is underneath an alternative is always appealing).
> 
> Another option once existing transformations to HTML or DOCX are
> included is to tap pandoc again and use its PDF generation.  That
> utilises LaTeX, however, and looks quite different to most other PDFs.
> Controlling that layout requires a detailed understanding of LaTeX's syntax.
> 
> I have yet to see any solution which utilises Apple's Quartz Core
> framework on which OS X is built.  Which is kind of odd given how
> extensive XML is in OS X configuration and that PDF is one of the true
> essential foundations of Quartz.  It's quite possible that everything
> is already in place, but Apple just hasn't told anyone because that's
> their secret.  They can be a bit weird sometimes.
> 
> I haven't been too worried about PDF or print so far, but if that
> changes and spending thousands of dollars isn't an option (and it
> isn't).  I suspect I'd end up finding the best way to feed documents
> back to LibreOffice and let it print them or produce the PDF,
> depending on which path required more fiddling.  It might even be
> possible to get LibreOffice to process the output of the
> com.oxygenxml.pdf.css plugin.  If that works then you've got a free
> framework that runs on OS X, Linux, Windows, Solaris, BSD and other
> bits and pieces.  Admittedly it's a 660MB framework, but that's still
> smaller than a full LaTeX installation.
> 
> ... oh damn, now I'm curious ... [at this point some time passed] ...
> 
> Okay, the quickest and easiest method is to use LibreOffice with the
> built-in DITA Map to ODF plugin and then use this command on the
> generated file:
> 
> [/path/to/]soffice --headless --convert-to pdf filename.odt
> 
> Change the path to the soffice binary to whatever is relevant for your
> platform.  For OS X users that should be:
> 
> /Applications/LibreOffice.app/Contents/MacOS/soffice --headless
> --convert-to pdf filename.odt
> 
> It may be preferable to adjust the OpenDocument template used with the
> generated file first and assuming the DITA-OT plugin is already using
> an ODF template it shouldn't be too difficult to make a duplicate
> scenario allowing alternative templates to be specified.  If that is
> the case, that's everything.
> 
> Some quick grepping, however, indicates that the default ODF output is
> entirely XSL driven, so chances are modifications will involve
> adjusting the ./xsl/xslodt/dita2odtstyles.xsl file in the plugin.
> This may be an issue if, for example, you think using Courier for
> footnotes is just annoying when all the rest of the text uses a
> variable width sans serif font.
> 
> If XSL editing is to be avoided at all costs then convert to DOCX,
> Docbook or a single page HTML file (if you leave them separate you
> will have a lot of typing to do) and use that with pandoc as pandoc
> can accept an ODF template on the command line with the --reference
> option.  Using this will completely override existing formatting so if
> you just want to adjust the current template's chosen fonts in the
> DITA-OT plugin via the Pandoc method, use LibreOffice to create a new
> template based on the style of a document generated with the DITA-OT
> plugin (it's possible within LO to import styles from any file,
> whether it's a template or not, which can be used to make a new
> template based on it - which means if you want to nick the style used
> in the LibreOffice documentation then you can).
> 
> Since LibreOffice can't use an ODF template via the headless command
> (very annoying, but can't be helped yet) the prize here goes to the
> first person or group who converts the existing DITA Map to ODF plugin
> to utilise an ODF template or, more likely, to create an ODF to DITA
> transformation which includes an option to generate XSL files to turn
> replace the current style XSL files (so ODF to XSL via an XSLT) since
> there's no real difference between an ODT and an OTT other than how
> the office suite(s) treat them.
> 
> The pandoc variant will extend this to work with DOCX and its
> template files too, but I only use DOCX as a stepping stone between
> various formats (it appears to be the "gateway drug" for all file
> formats, at least until DITA can cover everything).
> 
> Anyway, the important point is that it does work and will work on any
> system with either LibreOffice or OpenOffice.org and possibly with
> Abiword (I haven't used that one, so can't confirm).  For my part I'm
> trying to avoid PDF where possible (in part because I started reading
> the spec earlier this year for reasons better left to the mists of
> time, so I'd just go for the quick and easy convert to ODT and then
> adjust the styles before printing or exporting to PDF.  Either that or
> spend a day or two fiddling to get the builtin plugin to use better
> fonts and then leave it as is.  It probably needs some tweaking
> anyway, though since it didn't include things like a page break
> between the cover page and the copyright page which was a little
> surprising, no doubt the more time spent on it the less suckage it
> will spout (to a limited extent with PDF as the end point).
> 
> Regards,
> Ben
> 
> P.S.  You can cheat by generating an ePub and then converting that to
> PDF with the consumer's choice for file conversion: Calibre.
> Just don't rely on it because there are zero guarantees, it was
> never intended for anything other than managing personal ebook
> collections and shifting between devices (which is why I turned
> up in the first place, well due to that and Sigil breaking ePub
> 3.0 files).  I'd use it to test drafts and the like, but never
> in production.
> 
> _______________________________________________
> oXygen-user mailing list
> 
> https://www.oxygenxml.com/mailman/listinfo/oxygen-user

-- 

_______________________________________________
oXygen-user mailing list

https://www.oxygenxml.com/mailman/listinfo/oxygen-user 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20151118/405d6847/attachment.html>


More information about the oXygen-user mailing list