[Corpora-List] Similarity between documents

Max CHEVALIER Max.Chevalier at irit.fr
Sun Mar 22 00:39:57 UTC 2009


> Dear All
>
> *Someone knows any script available online to determine the similarity
> (cosine angle) documents ?*
>
> Best regards
>
> J.R. Colt Clint
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

The cosine measure is well known in IR field. It is integrated to Lucene
or other search engines.
You can find its definition in every books related to IR (Modern
Information Retrieval - Baeza-Yates & Ricardo Neto-
http://people.ischool.berkeley.edu/~hearst/irbook/) and Text Data Mining.
It is really simple to implement.

You also can find some relevant Java source at
http://sujitpal.blogspot.com/2008/09/ir-math-with-java-similarity-measures.html
with many other similarity measures. Note that I did not test it....

Best regards,

Max CHEVALIER.
---------------------------------
IRIT - Toulouse
France
http://www.irit.fr/~Max.Chevalier

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list