Oxygen XML Editor
 
[XML-DEV Mailing List Archive Home] [By Thread] [By Date]

Re: [xml-dev] Something altogether different?




One disadvantage of term-based weighting or vector space model is the
well-known example cited in the Google's original paper (rather sales
pitch??) --

A document with only the words "Bill Clinton sucks"; as opposed to the
actual white house page was considered more important for the query "Bill
Clinton" (when Clinton was the president)

I believe we can use vector-space model only when the document collection 
is "homogeneous" in some manner.. and has repetitive words etc.

Also note -- vector space model, you have to obtain rank of documents in
real-time given a query.

For other metrics such as say pagerank, rank of documents can be 
pre-computed, and we can use better algorithms based on this property.

best, murali.

 
© 2002-2008 SyncRO Soft Ltd. All rights reserved. | Sitemap | Privacy Policy
This website was created & generated with <oXygen/> XML Editor