[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
RE: Re: [xsl] What is a better word for "de-duplication"?
Subject: RE: Re: [xsl] What is a better word for "de-duplication"? From: cknell@xxxxxxxxxx Date: Mon, 28 Aug 2006 19:26:02 -0400 |
All sorts of terms with ambiguous or impenetrable meanings don't help. They muddy the water. A tool need not be pretty to be useful. Is there any doubt about the meaning of "de-duplication"? Not from where I sit. -- Charles Knell cknell@xxxxxxxxxx - email -----Original Message----- From: Andrew Franz <afranz0@xxxxxxxxxxxxxxxx> Sent: Tue, 29 Aug 2006 08:12:40 +1000 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] What is a better word for "de-duplication"? Wendell Piez wrote: > At 03:33 PM 8/28/2006, Andrew wrote: > >> Wendell Piez wrote: >> >>> Dear Dimitre, >>> >>> At 08:41 PM 8/27/2006, you wrote: >>> >>>> I want to use a single, short word to express the act of removing >>>> duplicates from a node-set. I remember seing the word "de-duplication" >>>> used, however it sounds ugly. >>> >>> >> Normalisation > > > Normalization (or 'normalisation' for those who prefer British > orthography) would rather be the general process of transforming a set > of values into their normalized forms. So, > > <date value="2006">May Day 2006</date> > <date value="2006-05-01"/> > <date value="5-1-2006">May 1 2006</date> > > might be normalized as > > <date value="2006-05-01">May 1 2006</date> > <date value="2006-05-01">May 1 2006</date> > <date value="2006-05-01">May 1 2006</date> > > but this would not deduplicate them. > > These are very different problems, especially for XSLT. Generally > speaking, deduplicating requires normalization first since > deduplication works only over canonical forms (or comparing them to > see which are duplicates becomes very difficult). > > Cheers, > Wendell Yes, this is one meaning of 'normalisation'. But 'normalisation' is richer and deeper than that. Think about relational database theory. //2NF = / A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation In the above example: / <date value="2006">May Day 2006</date> <date value="2006-05-01"/> <date value="5-1-2006">May 1 2006</date> becomes: <standardDate id="x" year="2006" month="5" day="1" /> plus: <date id="x" format="t yyyy">May Day</date> <date id="x" format="yyyy-mm-dd" /> <date id="x" format="Mmm dd yyyy" /> I submit that these are *not* the same. In your example, you simply removed the 'inconvenient' differences. In the database normalisation, the commonalities are "normalised" or "factored" out as a basis for comparison. In this process (applied to XSLT perhaps), <date> has been "de-duplicated" into <standardDate> but there is no loss of information. Why invent new terminology?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] What is a better word for, Andrew Franz | Thread | RE: Re: [xsl] What is a better word, sterling |
Re: [xsl] What is a better word for, Robert Koberg | Date | Re: [xsl] What is a better word for, Dimitre Novatchev |
Month |
Keywords