PDF Transformation Time

Post here questions and problems related to editing and publishing DITA content.
thedantanner
Posts: 42
Joined: Thu Oct 27, 2016 11:13 pm

PDF Transformation Time

Post by thedantanner »

I am working on optimizing our DITA custom PDF transformations, which sometimes take over an hour to process. I have tried several things that I've found here in the forums, but nothing seems to make much difference. I'm still pretty new to Oxygen and DITA so I'm sure there's a solution that I'm simply not seeing.

The ditamap I'm testing with has approximately 500 topics. I use a good number of conrefs throughout the document which are all stored in two files.

A few of the things I have done so far to try to speed things up:

1. I found the "copy media audio/video" files code and commented that out.
2. I increased the memory usage to 2000m.
3. I recently changed the output and temp locations from ${cfd} to dedicated output folders for each of our writers. For example: repository/output/dan/out/pdf. I left the base directory set to ${cfd} (because I really don't know what that does).

I also recently noticed that the transform scenario is trying to search for files outside of our repository directory. That's how I noticed that videos were being copied into the temp folders. I'm thinking that if I new how to stop the scenario from trying find things outside of our repository, the process would speed up dramatically.

Here's our repository structure:

repository
+topics
++++01
++++02
++++03
++++conrefs
++++preface
+ditamaps
+relationshiptables
+partspages

Any help or advice is appreciated. If more details are needed, please ask and I'll provide what I can. Again, I'm still pretty new, so talk slowly. ;)

Cheers,
Dan
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: PDF Transformation Time

Post by Radu »

Hi Dan,

What version of Oxygen are you using?

Please see some answers below:
1. I found the "copy media audio/video" files code and commented that out.
You did good, the copy media still has some problems finding the right resources to copy to the output folder, especially when you have resources outside of the main DITA Map folder.
2. I increased the memory usage to 2000m.
That's also good, increasing to "-Xmx200m" should be more than enough for your 500 topic file which is not that large anyway.
3. I recently changed the output and temp locations from ${cfd} to dedicated output folders for each of our writers. For example: repository/output/dan/out/pdf. I left the base directory set to ${cfd} (because I really don't know what that does).
That's also a good decision to have the output and temp folders separately. Are you working on SSD drives? This might speed up things.

So an hour to produce the PDF from 500 topics is still way too much though. It should take about 5-10 minutes.
Usually large delays are caused either by very intensive disk operations, or by CPU usage or by connections to remote web sites.
Are your DITA topics/maps DTD based or XML Schema based or Relax NG Based?
Are they very large topics? Are you running the DITA Open Toolkit publishing from Oxygen or from a command line?
Are you using the DITA Open Toolkit bundled with Oxygen or an external manually installed DITA OT?
Looking at your folder structure, the DITA OT publishing engine has problems when you have references to resources outside of the folder where the main DITA Map is located. And in your case the main DITA Map is in the "ditamaps" folder. Ideally the main DITA Map would be directly in the "repository" folder. All other submaps can remain in the "ditamaps" folder.

Also if you start the DITA OT from Oxygen, you can edit the transformation scenario and in the "Advanced" tab there is an "Additional arguments" field in which you can set -logger org.apache.tools.ant.listener.ProfileLogger. You should also go to the Preferences->DITA/Logging page and choose to always show the console output. The console output should start also showing how much each target took.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
thedantanner
Posts: 42
Joined: Thu Oct 27, 2016 11:13 pm

Re: PDF Transformation Time

Post by thedantanner »

I was able to get the build time down to ten minutes or less by a simple restructuring of the output files. Instead of: "repository/output/dan/out", I changed it to "repository/output_dan/out". I suppose the OT was having trouble with that "dan" subfolder.

Since ten minutes is what is to be expected, I'll probably be satisfied and move on to other things. I'll still answer all of your questions, just in case you spot other ways to optimize.

We're using Oxygen 19.1.
Are your DITA topics/maps DTD based or XML Schema based or Relax NG Based?
XML.
Are they very large topics? Are you running the DITA Open Toolkit publishing from Oxygen or from a command line?
Topics are quite small, usually tasks with ten steps or less, and we are publishing directly from Oxygen.
Looking at your folder structure, the DITA OT publishing engine has problems when you have references to resources outside of the folder where the main DITA Map is located. And in your case the main DITA Map is in the "ditamaps" folder. Ideally the main DITA Map would be directly in the "repository" folder. All other submaps can remain in the "ditamaps" folder.
I think we're on the same page here, and we are set up how you are describing.

repository folder (bookmaps)
-+ditamaps folder (all ditamaps)
-+topics
---+01 (section one topic files)
---+02 (section two topic files)... there are seven of these numbered folders
---+conrefs (a single conref ditamaps containing two topic files, one task and one concept)
set -logger org.apache.tools.ant.listener.ProfileLogger. You should also go to the Preferences->DITA/Logging page and choose to always show the console output.
I do have the console output displayed already. The output does freeze in the same places every time, once on [gen-list] serializing job specification, and then on several topics throughout the various processes. I've inspected these files and nothing about them looks odd.

What is the benefit of setting that additional argument you suggested?

As always, thanks for all of your help here, Radu. It's truly appreciated.
Dan
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: PDF Transformation Time

Post by Radu »

Hi Dan,

I have no idea why that change to the output folder made any difference. You are running the processing on the local disk and not on some shared network drive, right?
And your topics are XML Schema based right? Processing for DTD-based topics is usually faster as it uses some special caching of the used DTDs.
Can you disable your network connection before running the transformation? Usually the transformation process would need no resource from the web and I'm curious if the processing in your case will complain that certain network resources are not accessible.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
thedantanner
Posts: 42
Joined: Thu Oct 27, 2016 11:13 pm

Re: PDF Transformation Time

Post by thedantanner »

I have no idea why that change to the output folder made any difference. You are running the processing on the local disk and not on some shared network drive, right?
Actually, no. The repository lives on a shared network drive.
And your topics are XML Schema based right? Processing for DTD-based topics is usually faster as it uses some special caching of the used DTDs
I think I was wrong about this. Being pretty new to all this, I assumed it was XML schema. Now that I look at a topic file I see that I was probably wrong. These are the first two lines in the code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "technicalContent/dtd/task.dtd"[
]>
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: PDF Transformation Time

Post by Radu »

Hi,

The DITA Open Toolkit processing is very disk-intensive, each stage takes the entire content from the transformation temporary files folder, reads it, modifies it and then writes it back. I would suggest having at least the transformation temporary files folder on your local disk drive.
Also having that shared network drive means that some times it may run faster and some times slower depending on the network load at those times.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
thedantanner
Posts: 42
Joined: Thu Oct 27, 2016 11:13 pm

Re: PDF Transformation Time

Post by thedantanner »

Wow! One minute and one second when I changed my output destinations to a local folder.

If that's not a great result, I don't know what is.

Thanks for your continued help and advice!

Dan
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: PDF Transformation Time

Post by Radu »

Hi Dan,

I'm glad this works better for you. Probably a future version of the publishing engine will store more data in the internal memory and remove this need for a fast local storage. Because indeed right now the speed of the local storage is a bottle neck.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply