[XML-DEV Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xml-dev] hashing
md5sum is a cryptographic hash using the MD5 algorithm. It's not fast, but
it will do what you want. It's available in linux, in cygwin, and probably
other ways.
In a reasonable command shell, where unix commands are available along with
md5sum,
md5sum *.xml | sort
will put the duplicate files on neighboring lines.
Jeff
----- Original Message -----
From: "Eric Hanson" <eric@...>
To: <xml-dev@...>
Sent: Thursday, April 29, 2004 12:58 PM
Subject: [xml-dev] hashing
> I have a large collection of XML documents, and want to find and
> group any duplicates. The obvious but slow way of doing this is
> to just compare them all to each other. Is there a better
> approach?
>
> Particularly, is there any APIs or standards for "hashing" a
> document so that duplicates could be identified in a similar way
> to what you'd do with a hash table?
|