[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Managing debug logging in complex transforms: what do people do?

Subject: Re: [xsl] Managing debug logging in complex transforms: what do people do?
From: Graydon <graydon@xxxxxxxxx>
Date: Wed, 26 Mar 2014 15:40:54 -0400

On Wed, Mar 26, 2014 at 02:56:08PM -0400, Liam R E Quin scripsit:
> On Mon, 2014-03-24 at 18:02 -0400, Graydon wrote:
> [...]
> > Single digit integer minutes, quite often, outside debug mode.
> I remember a client with a transformation that was taking I think 20
> minutes on a relatively small file; in the end I preprocessed the style
> sheet to add a trace message at the start of each template, and ran that
> through a simple program (I might'v written a Perl script to do it) that
> timestamped each line of the trace.

We used Saxon's -T "trace" mode and processed the output to identify
miscreant templates, which were then optimized.  Some of them were
pretty seriously miscreant, too, it's difficult to re-group mixed
content efficiently when you're moving parts of it somewhere else, plus
a lot of external lookup.  I found out that the "load file as a big
variable and XPath vs load file as a big variable and keys" tip-over
point in favour of keys for the lookup was around a thousand things,

The original run time, on Cygwin instead of real POSIX, could be over
8 hours.  It got down to ~10 minutes for that set of input with various
process changes, along with improving the problematic templates.

These were often large, complex files -- legislative acts and
regulations, which vary from "don't fish there on Thursdays" to the
entire Income Tax Act -- where we had to do a lot of restructuring and
there were something like twelve or fourteen full passes through the
data, with most of the passes involved in properly building the
legislative citations for anything with a number.  The
single-digit-integer-minutes run times were pretty good for the input.

I think the combination of twelve passes and a 60 MB input file (they
were by no means all that big but inevitably the larger ones had weirder
problems) was working out to 12 x 5 x 60 = 3.6 GB of parsed XML plus
non-trivial debugging overhead for the Java heap, and crushing the
debugger under swapping load.  The non-debugging processing looked like
it was a lot more able to throw intermediate results away when it's done
with them, and small input debugged fine and smaller input would debug,
it was only the really big stuff that was hopeless.

> On the other hand I don't recommend optimizing something that doesn't
> yet work, as the more work you put into it, the less you'll be willing
> to rewrite altogether when you find parts that are wrong :-) and it's a
> waste of your time, of course.

"Premature optimization is the root of all evil."  Complete agreement
from me!

-- Graydon

Current Thread