Oxygen XML Editor
 
[XML-DEV Mailing List Archive Home] [By Thread] [By Date]

Re: [xml-dev] hashing



md5sum is a cryptographic hash using the MD5 algorithm.  It's not fast, but
it will do what you want.  It's available in linux, in cygwin, and probably
other ways.

In a reasonable command shell, where unix commands are available along with
md5sum,

md5sum *.xml | sort

will put the duplicate files on neighboring lines.

Jeff

----- Original Message ----- 
From: "Eric Hanson" <eric@...>
To: <xml-dev@...>
Sent: Thursday, April 29, 2004 12:58 PM
Subject: [xml-dev] hashing


> I have a large collection of XML documents, and want to find and
> group any duplicates.  The obvious but slow way of doing this is
> to just compare them all to each other.  Is there a better
> approach?
>
> Particularly, is there any APIs or standards for "hashing" a
> document so that duplicates could be identified in a similar way
> to what you'd do with a hash table?


 
© 2002-2008 SyncRO Soft Ltd. All rights reserved. | Sitemap | Privacy Policy
This website was created & generated with <oXygen/> XML Editor