[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Post-Processing PDF For Back-Of-The-Book Indexes


Subject: [xsl] Post-Processing PDF For Back-Of-The-Book Indexes
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Sun, 10 Feb 2002 09:20:32 -0600

In reference to an earlier thread about eliminating duplicate page
numbers in back-of-the-book indexes generated by XSL-FO styles, I have
successfully done this using the free PJ library from www.Etymon.com.
With this library you can interact with PDF at the lowest level of
granularity (individual PDF operators within a page). In my case, I was
able to get to the individual lines of the index pages, find sequences
of repeated numbers, remove them from the document, and write a new PDF
document. It required about 150 lines of Python (using the Jython
interpreter to provide access to the PJ Java library) to implement the
initial functionality I needed.

I'm not quite ready to post code--I need to refine what I've written and
do more testing, but I wanted to report this initial success as I know
others are struggling with this same problem.

Cheers,

Eliot
-- 
W. Eliot Kimber, eliot@xxxxxxxxxx
Consultant, ISOGEN International

1016 La Posada Dr., Suite 240
Austin, TX  78752 Phone: 512.656.4139

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread