Comparing the output from two PDF Chemistry versions

chrispitude
Posts: 338
Joined: Thu May 02, 2019 2:32 pm

Comparing the output from two PDF Chemistry versions

Post by chrispitude » Tue Feb 23, 2021 3:10 pm

Hi folks,

I wanted a way to test a new version of PDF Chemistry against my current version of PDF Chemistry to see if the PDF output was different. So I wrote this utility:

https://github.com/chrispy-snps/compare-pdf-images

This utility compares two PDF files by rendering them to bitmap images (multipage TIFF files), then compares the images for pixel-level differences. In this way, cosmetic CSS differences like padding/font/style differences are detected.

On top of the basic compare_pdf_images.pl script, I wrote an additional bash shell script at

https://github.com/chrispy-snps/compare ... ersions.sh

to run a full output regression test for all our books to see if I need to update our CSS to account for any changes in the new PDF Chemistry version. This also allows me to test if I can safely remove workarounds from our CSS once a bug is fixed.

Dan
Posts: 485
Joined: Mon Feb 03, 2003 10:56 am

Re: Comparing the output from two PDF Chemistry versions

Post by Dan » Thu Feb 25, 2021 3:39 pm

Thank you for sharing! For our automated tests we use area tree dumps (XML fragments with all the details: coordinates, shapes, text position), Comparing the images is probably more intuitive.

chrispitude
Posts: 338
Joined: Thu May 02, 2019 2:32 pm

Re: Comparing the output from two PDF Chemistry versions

Post by chrispitude » Fri Feb 26, 2021 6:43 pm

Thanks Dan! The nice thing about testing the final result visually is (1) I didn't have to know very much about the insides of PDF Chemistry (very complicated!), and (2) it's an end-to-end comparison from input to output. If you add more PDF compression some day, it will get included in the validation.

I actually wondering about contacting you offline to see if you'd want to use a variant of

https://github.com/chrispy-snps/compare ... ersions.sh

for your own regression tests. You could perhaps use the same .ditamap unit tests that you have now, although I suppose that's not a help if you're already comparing them via another method.

For us, the biggest challenge is that our company CSS must undo many things in the Oxygen default CSS, and so as new Oxygen versions introduce new changes, we must catch and correct any deviations from what writers expect. For example, if you introduce nice icons for various note types (...), I need to catch this and incrementally suppress them in our own CSS.

If there are enhancements I could make to any of this that would make this more valuable to you, please let me know!

Dan
Posts: 485
Joined: Mon Feb 03, 2003 10:56 am

Re: Comparing the output from two PDF Chemistry versions

Post by Dan » Mon Mar 01, 2021 12:28 pm

Speaking of notes, we are about to change them in 23.1, I hope they will not break again your customization :). We plan to stabilize the intermediate HTML structure and the default CSS starting from version 24.

For our regression tests we use the PDFBox Java library. We actually check differences between the intermediate FO files, the area trees, and the PDF stream (using PDFBox). Using PDFBox we can get images as well, similar to the command line utilities invoked from your script.

If you want to give us more details, you can contact me on the support email address.

Thank you,
Dan

Post Reply