[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: Removing duplicates not preceding vs. keys


Subject: RE: Removing duplicates not preceding vs. keys
From: Kay Michael <Michael.Kay@xxxxxxx>
Date: Fri, 7 Jul 2000 14:24:39 +0100

> a while ago I asked something about removing duplicates.
> Most of the answers I got concerned either using
>   <xsl:for-each select="//SPEECH[not(.=preceding::SPEECH)]">
> or
> <xsl:key name="sortKey" match="value" use="var" />
> 
> both of 'em work fine, but can anybody tell which one is more 
> favourable and why?

The "preceding" solution typically has O(n*n) performance, it involves
comparing each SPEECH with each SPEECH that precedes it, so as the number of
items doubles, elapsed time increases by a factor of four.

The "key" solution typically has O(n log(n)) performance, it involves a
adding each item to an index and looking up each item in an index. So when
the number of items doubles, elapsed time increases by a factor of only say
2.1

That means that the "preceding" solution may be faster for small files, but
as the files get bigger, the "key" solution will win.

Of course, this is all based on assumptions on how the implementations work:
assumptions that are reasonable, but not necessarily true of all products.

Mike Kay


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread