[Corpora-List] Developing and testing new similarity measures for word clustering

Fri Oct 8 12:47:02 UTC 2004

I have been reviewing some of the similarity measures used to perform word
clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
to the conclusion that many of those measures had some metric problems that
probably make them non optimal for word clustering.

I am working now on some modified versions of those indices and I need some
ways to benchmark those new similarity measures.  I would like to have a
series of benchmarks for several kinds of application (dimension reduction,
automatic identification of themes, automatic taxonomy development, etc.).

I would like suggestions for ways to benchmark those new measures and
compare their performance with the more traditional ones.  Any idea,
reference, data set would be welcome.

I am also looking for existing articles where those measures have been
compared (either empirically or theoretically)

Thanks,

Normand Peladeau
Provalis Research