[Corpora-List] semantic similarity

Adam Kilgarriff adam at lexmasterclass.com
Thu Jan 20 20:32:08 UTC 2005


Jana

> The approach must not require POS tagging

whyever not? Ten years ago there was an excuse for ignoring syntax (no
tools, too slow to run over big corpora, expensive) but I don't think there
is any more.

You get much better results if you respect syntax (see eg thesaurus at
www.sketchengine.co.uk which shallow-parses and uses Dekang Lin's
similiarity measure)


Adam Kilgarriff


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Jana Diesner
Sent: 20 January 2005 16:36
To: CORPORA at hd.uib.no
Subject: [Corpora-List] semantic similarity

Dear list members,

We are looking for strategies, algorithms or code to automatically find
single terms or multiple adjacent terms that are semantically similar within
and across documents. or an
initial input of a reference term to the system. The resulting clusters of
semantically similar terms suggested by the system do not need to be
exclusive. We are familiar with secondstring, the software developed by
William Cohen, and semantic similarity based on string-edit distances.



Thank you very much.

Jana



____________________

Jana Diesner
Carnegie Mellon University

jdiesner at andrew.cmu.edu



More information about the Corpora mailing list