[Corpora-List] semantic similarity

ted pedersen tpederse at d.umn.edu
Thu Jan 20 18:47:16 UTC 2005


Hi Jana,

WordNet-Similarity is an implemented system that will let you measure the
semantic similarity between words in text using a host of well known
methods, including those of Resnik, Jiang & Conrath, Leacock & Chodorow,
Wu & Palmer, shortest path, adapted lesk, Hirst & St-Onge, and even a
context vector measure. It does all this based on information from
WordNet. The code is in Perl and is free, and of course WordNet is free
too. Download it from:

	http://search.cpan.org/dist/WordNet-Similarity or
	http://wn-similarity.sourceforge.net

Now, WordNet-Similarity will get you started in measuring semantic
similarity (or relatedness, with a few measures). We have also been
working on an algorimth based WordNet-Similarity that will measure how
related a word is to its neighbors in a text.

This algorithm is called WordNet-SenseRelate and can be used with plain
text, and is again based  on WordNet. Our goal in this package is to
carry out word sense disambiguation of all the content words in a text,
but what's really happening under the surface is what you are aspiring to
do, and that is find nearby words that are similar to each other
(in our case according to the measures in WordNet-Similarity).

Again in Perl, and again free. Download from:

	http://search.cpan.org/dist/WordNet-SenseRelate
	http://www.d.umn.edu/~tpederse/~senserelate.html

I hope one or both of these are of interest to you. Let us know if you
have any additional questions!

Cordially,
Ted

 On Thu, 20 Jan 2005, Jana Diesner wrote:

> Dear list members,
>
> We are looking for strategies, algorithms or code to automatically find
> single terms or multiple adjacent terms that are semantically similar within
> and across documents. The approach must not require POS tagging or an
> initial input of a reference term to the system. The resulting clusters of
> semantically similar terms suggested by the system do not need to be
> exclusive. We are familiar with secondstring, the software developed by
> William Cohen, and semantic similarity based on string-edit distances.
>
>
>
> Thank you very much.
>
> Jana
>
>
>
> ____________________
>
> Jana Diesner
> Carnegie Mellon University
>
> jdiesner at andrew.cmu.edu
>
>
>

--
Ted Pedersen
http://www.d.umn.edu/~tpederse



More information about the Corpora mailing list