DITA-OT HTML5 processing in PDF Chemistry

Post by **chrispitude** » Sat Aug 01, 2020 5:45 pm

PDF Chemistry uses the HTML5 output from the DITA-OT as a starting point for its own publishing process. And I really like that when I create my own HTML5 plugins to customize labels and structures, that PDF Chemistry picks up the changes!

But the standard DITA-OT HTML5 transformation creates an entire directory structure, whereas PDF Chemistry generates a single merged file. So this makes me wonder:

How does PDF Chemistry create a merged file instead of a directory structure?
Are there preprocess/HTML5 XSLT templates I shouldn't touch, because they could break something PDF Chemistry does differently in its own HTML5 creation?
Are there any steps or templates that PDF Chemistry omits?

For example, I see that PDF Chemistry has DITA-specialized attributes in its merged HTML5 file to enable CSS styling, whereas the standard HTML5 output has these removed.

Thanks!

Post by **julien_lacour** » Thu Aug 06, 2020 5:51 pm

Hello Chris,

In fact the PDF Chemistry transformation is based on the PDF XSL-FO transformation but uses stylesheets from the HTML5 DITA-OT plugin during the pre-process. You can see this workflow inside the DITA-OT-DIR/plugins/com.oxygenxml.pdf.css/build.xml file.

The directory structure is specific to our transformation and is inspired from the pdf2 plugin (as I explained above)
Most of the HTML5 XSLT templates are used by the pdf-css-html5 plugin (the one using Chemistry) so potentially each modification in these stylesheet will impact the PDF output (and also possibly the WebHelp Responsive output).
Some of the templates are modified from the HTML5 basis, in this case they are inside DITA-OT-DIR/plugins/com.oxygenxml.pdf.css/xsl

You will also see the html5 stylesheet import inside DITA-OT-DIR/plugins/com.oxygenxml.pdf.css/xsl/merged2html5/html5.xsl (it is the parent stylesheet called to create the .merged.html file)

Regards,
Julien

Post by **chrispitude** » Sat Aug 22, 2020 4:02 pm

Julien, this is exactly the information I needed - thank you!

Post by **chrispitude** » Wed Sep 23, 2020 10:15 pm

For certain cross-references, we have a special title-only link text format specified with @outputclass:

Code: Select all

<xref href="#tableid" outputclass="title"/>

We then have code applied by the dita.xsl.topicpull extension point to implement it:

Code: Select all

  <!-- a cross-reference with outputclass=='title' always uses the target title text -->
  <xsl:template match="*[contains(@class, ' topic/table ') or contains(@class, ' topic/fig ')]"
                mode="topicpull:resolvelinktext" priority="20">
    <xsl:param name="linkElement" as="element()" tunnel="yes"/>
    <xsl:choose>

      <!-- business as usual if @outputclass != 'title' -->
      <xsl:when test="not($linkElement[contains(@outputclass, 'title')])">
        <xsl:next-match/>
      </xsl:when>

      <!-- force the title as the target text -->
      <xsl:otherwise>
        <xsl:variable name="target-text" as="xs:string*">
          <xsl:apply-templates
            select="*[contains(@class, ' topic/title ')]" mode="text-only"/>
        </xsl:variable>
        <xsl:value-of select="normalize-space(string-join($target-text, ''))"/>
      </xsl:otherwise>

    </xsl:choose>
  </xsl:template>

PDF Chemistry ignores my mode="insertReferenceTitle" template; I do not see it used in the merged XML code created by the merged2merged code, and I see that PDF Chemistry has its own mode="insertReferenceTitle" target text machinery instead at

<oxygen>/frameworks/dita/DITA-OT3.x/plugins/com.oxygenxml.pdf.css/xsl/merged2merged/merged-links.xsl

Now here's where I am confused. In my plugin, if I extend com.oxygenxml.pdf.css.xsl.merged2merged and add the following simple template, then *my* mode="topicpull:resolvelinktext" code is used!

Code: Select all

<xsl:template match="*" mode="insertReferenceTitle" priority="100">
  <xsl:next-match/>
</xsl:template>

But if I comment that template out, then PDF Chemistry's mode="insertReferenceTitle" code is used instead. The template above shouldn't change anything; it should simply fall through to the next template. But its presence or absence makes my template get applied or not applied, respectively.

What would cause this?

Post by **Dan** » Thu Sep 24, 2020 12:53 pm

I am not sure why this happens.

You can try debugging the XSLT transformation with oXygen - it is a bit complicated to setup this, but here is how it may be done:
The file you need to debug is:

DITA-OT/plugins/com.oxygenxml.pdf.css/xsl/merged2merged/merged.xsl

First, make a DITA transformation with the parameter "clean.temp" set to "no".
Then, in Oxygen XML Editor:

1. Then open the file stage1.xml file from the temporary directory.
2. Open the merged.xsl file and comment the line importing the template extension point:<xsl:import href="template:xsl/com.oxygenxml.pdf.css.xsl.merged2merged"/>
3. In the file merged-all.xsl from the same directory comment out: <xsl:import href="../review/review-pis-to-elements.xsl"/>
4. Validate the merged.xsl file, try to fix the errors by removing the missing templates calls "<xsl:call-template name="add-review-pis-for-root"/>", and by replacing calls to the function getImage(size) with zero.
5. Comment out the include to merged-tables.xsl from merged-all.xsl
6. Make a transformation, it should complete without errors.
7. Now make the transformation in the XSLT debugger.

Place breakpoints in your template and watch how the call stack looks like.

Many regards,
Dan

Post by **chrispitude** » Mon Sep 28, 2020 5:34 pm

Hi Dan,

Many thanks for your instructions! I will likely need them at some point to figure something out.

But fortunately, I found my issue through code inspection. I realized that preprocess is called first, then merged2merged is called later to possibly replace it. My issue was that my <xsl:next-match/> override of the mode="insertReferenceTitle" template did not pass the parameters down to the next-matching template, causing it to fall through and leave the previous content from the mode="topicpull:resolvelinktext" template there.

And also during code inspection, I noticed that the mode="insertReferenceTitle" template leaves the existing content unmodified if there is a DITA-OT "usertext" PI in place. So, I updated my mode="topicpull:resolvelinktext" template to set this:

Code: Select all

  <!-- add usertext PI for xrefs with outputclass='title' -->
  <xsl:template match="*[contains(@outputclass, 'title')]" mode="topicpull:getlinktext">
    <xsl:param name="targetElement" as="element()"/>
    <xsl:apply-templates select="." mode="topicpull:add-usertext-PI"/>  <!-- I had to add this part -->
    <xsl:next-match>
      <xsl:with-param name="targetElement" as="element()" select="$targetElement"/>
    </xsl:next-match>
  </xsl:template>

and now PDF Chemistry leaves the preprocess-derived text in place! And I did not have to use any PDF Chemistry extension points after all.

Post by **Dan** » Tue Sep 29, 2020 9:17 am

Such a complex situation... Glad you work it out!

DITA-OT HTML5 processing in PDF Chemistry

DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry

Re: DITA-OT HTML5 processing in PDF Chemistry