[Corpora-List] Developing and testing new similarity measures for word clustering
Normand Peladeau
peladeau at simstat.com
Fri Oct 8 12:47:02 UTC 2004
I have been reviewing some of the similarity measures used to perform word
clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
to the conclusion that many of those measures had some metric problems that
probably make them non optimal for word clustering.
I am working now on some modified versions of those indices and I need some
ways to benchmark those new similarity measures. I would like to have a
series of benchmarks for several kinds of application (dimension reduction,
automatic identification of themes, automatic taxonomy development, etc.).
I would like suggestions for ways to benchmark those new measures and
compare their performance with the more traditional ones. Any idea,
reference, data set would be welcome.
I am also looking for existing articles where those measures have been
compared (either empirically or theoretically)
Thanks,
Normand Peladeau
Provalis Research
More information about the Corpora
mailing list