debug ideas for extremely long PDF processing time?

Post here questions and problems related to editing and publishing DITA content.
nharrison
Posts: 10
Joined: Fri May 03, 2013 7:53 pm

debug ideas for extremely long PDF processing time?

Post by nharrison »

So, as usual, Radu found a way to help me debug and fix my problem with fonts not appearing; it turned out to be a basic build problem rather than a font problem.

Now, my output is building, but incredibly slowly. A book that takes about 15 seconds to run through the generic PDF2 plugin takes over 2 minutes to run through mine.

Can anyone suggest where I could add debugging messages in my style sheets (or anywhere else, like in the build file, though that just calls the PDF2 build file) to figure out what's taking so long?
Or if anyone has any ideas on what might make a PDF2-based PDF plugin transform take so long, I'd love to hear them.

Thanks,
Nancy
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: debug ideas for extremely long PDF processing time?

Post by Radu »

Hi Nancy,

First as the transformation progresses you could look in the console view and see maybe in what stage the console blocks for a long time without outputting any message.
The problem could either be some XSLT customization (you could try to comment out all your customization templates one by one and see when the transformation is fast again) or maybe there is some place where a resource from a remote (web) location is read.
Some time ago I wrote a topic in our user guide about using the Oxygen XSLT debugger to debug the PDF transformation:

http://www.oxygenxml.com/doc/ug-oxygen/ ... ation.html

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
nharrison
Posts: 10
Joined: Fri May 03, 2013 7:53 pm

Re: debug ideas for extremely long PDF processing time?

Post by nharrison »

I managed, by adding messages at the beginning and end of every template I'd added or modified for the plugin, to isolate the problem. It's being caused by the time it takes to open and get data from SVG files, but I don't know why it's taking so long. The template that is slowing things down is:

Code: Select all


   <xsl:template name="get-raw-image-height">
<xsl:param name="href"/>
<xsl:variable name="svg-file" select="document($href)"/>
<xsl:variable name="viewbox">
<xsl:if test="document($href)/svg:svg">
<xsl:value-of select="$svg-file/svg:svg/@viewBox"/>
</xsl:if>
</xsl:variable>
<xsl:variable name="viewbox-width-height" select="substring-after($viewbox,'0 0 ')"/>
<xsl:variable name="raw-height" select="substring-after($viewbox-width-height,' ')"/>
<xsl:value-of select="$raw-height"/>
</xsl:template>
and the slowdown comes during the <xsl:if> operation. Each time it has to open a new SVG file, it takes about 3.25 seconds to do so. So a document with about 40 images takes about 2 1/2 minutes to process (2 min, 10 sec for the images, 20 sec for the rest of the publishing).

Can you suggest why the process of opening and getting data from an SVG file (mostly small files less than 10KB) would take so long? Is there a faster way to get this information from the SVG files?

Thanks,
Nancy
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: debug ideas for extremely long PDF processing time?

Post by Radu »

Hi Nancy,

If you open one of those SVG images in Oxygen it should have a DOCTYPE header, something like:

Code: Select all

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
Each time the SVG document is loaded by a parser (for example in your case by Saxon by calling document($href)) its DOCTYPE DTDs also need to be loaded. If you do not have a mapping in the XML catalog file used by the XSLT transformation for the "-//W3C//DTD SVG 1.0//EN" public ID, the XML parser used by Saxon will download the resource from the online location http://www.w3.org/TR/2001/REC-SVG-20010 ... /svg10.dtd. This would happen every time a new image is read, resulting in 1-2 seconds of network access latency adding up for each image.

You should probably enhance your plugin and add the plugin extension "dita.specialization.catalog.relative" in order to add a new mapping in the DITA-OT-INSTALL-DIR/catalog-dita.xml for the SVG public ID to a local DTD.

For example Oxygen internally uses to map SVG public IDs the DTDs and XML catalog located folder:

OXYGEN_INSTALL_DIR\frameworks\svg\dtd

You can look in there for the svgcatalog.xml XML catalog file and see what DTD public IDs it maps.
You could probably copy the entire OXYGEN_INSTALL_DIR\frameworks\svg\dtd folder to your PDF customization plugin and add in your plugin the extension "dita.specialization.catalog.relative" in order to automatically add the additional mappings from the svgcatalog.xml to the main catalog-dita.xml when your plugin is installed by the DITA OT integrator.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply