[Corpora-List] Developing and testing new similarity measures for word clustering

Fri Oct 8 21:19:43 UTC 2004

One place where similarity evaluation metrics have come up is in, of
all places, the music community...

http://www.ee.columbia.edu/~dpwe/research/musicsim/metrics.html

Also see the list of papers here
http://www.ee.columbia.edu/~dpwe/research/musicsim/

I'd be really interested to see what you come up with, and hope you
post a summary.

Cheers,

Dinoj Surendran
PhD Student
Computer Science Department
University of Chicago
http://people.cs.uchicago.edu/~dinoj

On Fri, 08 Oct 2004 08:47:02 -0400, Normand Peladeau
<peladeau at simstat.com> wrote:
> I have been reviewing some of the similarity measures used to perform word
> clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
> to the conclusion that many of those measures had some metric problems that
> probably make them non optimal for word clustering.
>
> I am working now on some modified versions of those indices and I need some
> ways to benchmark those new similarity measures.  I would like to have a
> series of benchmarks for several kinds of application (dimension reduction,
> automatic identification of themes, automatic taxonomy development, etc.).
>
> I would like suggestions for ways to benchmark those new measures and
> compare their performance with the more traditional ones.  Any idea,
> reference, data set would be welcome.
>
> I am also looking for existing articles where those measures have been
> compared (either empirically or theoretically)
>
> Thanks,
>
> Normand Peladeau
> Provalis Research
>
>