Page 1 of 1

Comparing the output from two PDF Chemistry versions

Posted: Tue Feb 23, 2021 3:10 pm
by chrispitude
Hi folks,

I wanted a way to test a new version of PDF Chemistry against my current version of PDF Chemistry to see if the PDF output was different. So I wrote this utility:

This utility compares two PDF files by rendering them to bitmap images (multipage TIFF files), then compares the images for pixel-level differences. In this way, cosmetic CSS differences like padding/font/style differences are detected.

On top of the basic script, I wrote an additional bash shell script at ...

to run a full output regression test for all our books to see if I need to update our CSS to account for any changes in the new PDF Chemistry version. This also allows me to test if I can safely remove workarounds from our CSS once a bug is fixed.

Re: Comparing the output from two PDF Chemistry versions

Posted: Thu Feb 25, 2021 3:39 pm
by Dan
Thank you for sharing! For our automated tests we use area tree dumps (XML fragments with all the details: coordinates, shapes, text position), Comparing the images is probably more intuitive.

Re: Comparing the output from two PDF Chemistry versions

Posted: Fri Feb 26, 2021 6:43 pm
by chrispitude
Thanks Dan! The nice thing about testing the final result visually is (1) I didn't have to know very much about the insides of PDF Chemistry (very complicated!), and (2) it's an end-to-end comparison from input to output. If you add more PDF compression some day, it will get included in the validation.

I actually wondering about contacting you offline to see if you'd want to use a variant of ...

for your own regression tests. You could perhaps use the same .ditamap unit tests that you have now, although I suppose that's not a help if you're already comparing them via another method.

For us, the biggest challenge is that our company CSS must undo many things in the Oxygen default CSS, and so as new Oxygen versions introduce new changes, we must catch and correct any deviations from what writers expect. For example, if you introduce nice icons for various note types (...), I need to catch this and incrementally suppress them in our own CSS.

If there are enhancements I could make to any of this that would make this more valuable to you, please let me know!

Re: Comparing the output from two PDF Chemistry versions

Posted: Mon Mar 01, 2021 12:28 pm
by Dan
Speaking of notes, we are about to change them in 23.1, I hope they will not break again your customization :). We plan to stabilize the intermediate HTML structure and the default CSS starting from version 24.

For our regression tests we use the PDFBox Java library. We actually check differences between the intermediate FO files, the area trees, and the PDF stream (using PDFBox). Using PDFBox we can get images as well, similar to the command line utilities invoked from your script.

If you want to give us more details, you can contact me on the support email address.

Thank you,