CORPORA: approval required (DA7EB695) (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:10:06 UTC 2007




---------- Forwarded message ----------
Date: Wed, 18 Jul 2007 03:19:53 -0400
From: "The LINGUIST List. LISTSERV Server (14.4)"
     <LISTSERV at LISTSERV.LINGUISTLIST.ORG>
To: Listserv Administrator <listman at LISTSERV.LINGUISTLIST.ORG>
Subject: CORPORA: approval required (DA7EB695)

This message was originally submitted by corpora-bounces at UIB.NO to the CORPORA
list at LISTSERV.LINGUISTLIST.ORG. You can approve it using the "OK" mechanism
(click on the  link below), ignore it,  or repost an edited  copy. The message
will expire automatically and you do not  need to do anything if you just want
to discard it. Please refer to the  list owner's guide if you are not familiar
with the "OK" mechanism; these  instructions are being kept purposefully short
for your convenience in processing large numbers of messages.

To APPROVE the message:
http://listserv.linguistlist.org/cgi-bin/wa?OK=DA7EB695&L=CORPORA
-------------- next part --------------
Dear Nuno,

Some of us are of the opinion that measures of semantic similarity are best
obtained through the proxy of distributional similarity.  While there is of
course an argument that they are simply not the same thing, distributional
similarity has the decided advantage that similarity scores are objective,
inexpensive, and readily available for the full lexicon.

Thesauruses (based on distributional similarity) for seven major world
languages can be viewed at http://sketchengine.co.uk  

Methods for exploring the hypothesis that, roughly, "distributional
thesauruses are better than manual ones (for NLP purposes)" are discussed in

H. Calvo, A. Gelbukh and A.Kilgarriff 2005. Automatic Thesaurus vs. WordNet:
A Comparison of Backoff Techniques for Unsupervised PP Attachment
<http://www.kilgarriff.co.uk/Publications/2005-CalvoGelbukhKilg-CICLING-PPat
tachThes.pdf> . Proc. CICLING, 5th Int. Conf. on Intelligent Text Processing
and Computational Linguistics, Mexico City. Springer Verlag.
Kilgarriff, A. 2003. Thesauruses for Natural Language Processing
<http://www.kilgarriff.co.uk/Publications/2003-K-Beijing-thes4NLP.pdf> .
Keynote lecture.  Proc. Natural Language Processing and Knowledge
Engineering (NLPKE). Beijing, October.
(Apologies for self-citation)

Regards

Adam Kilgarriff


-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Nuno Seco
Sent: 17 July 2007 18:33
To: CORPORA at UIB.NO
Subject: [Corpora-List] Call for Participation - Semantic
SimilarityExperiment

[Apologies for cross-postings]
[Please distribute to potentially interested parties]

In the context of joint research project we are asking fellow researchers to
contribute about 10 min of their time and collaborate in experiment that (we
hope) will help us gather a large dataset of similarity ratings for pairs of
words. Participation is quite simple, so if you are interested please read
the section HOW TO PARTICIPATE. If you want to learn more about the
experiment please read the section INTRODUCTION.

Thanks in advance,

Giuseppe Pirr? & Nuno Seco

----------------------------------------------------------------------------

Introduction:

Semantic similarity plays an important role in Information Retrieval,
Natural Language Processing, Ontology Mapping and other related fields of
research.

In particular researchers have developed a variety of semantic similarity
and relatedness measures by exploiting information found in lexical
resources such as WordNet. Current similarity metrics based on WordNet can
be classified in one of the following categories:

    Edge-Counting measures that are based on the number of links relating
two concepts that are being compared.

    Information Content measures that are based on the idea that the
similarity of two concepts is related to the amount of information they have
in common.

    Feature-Based measures that exploit the features (e.g., descriptions in
natural language) of a term while usually ignoring their location in the
taxonomy.

    Hybrid measures that combine ideas from previous categories.

In order to evaluate the suitability of the various similarity measures they
are usually compared against human judgements by calculating correlation
values. A typical reference, in terms of evaluation, are the results of the
Rubenstein and Goodenough (R&G) experiment. R&G in 1965 obtained "synonymy
judgments" of 51 human subjects on 65 pairs of words. The pairs ranged from
"highly synonymous" (gem-jewel) to "semantically unrelated" (noon-string).
Subjects were asked to rate them on the scale of 0.0 to 4.0 according to
their "similarity of meaning" and ignoring any other observed semantic
relationships.

Even if from the R&G experiment, other similar experiments have been carried
out, we are not aware of similarity experiments aimed at showing how robust
the different measures are when compared against different versions of
WordNet. With this objective in mind we want to collect human similarity
estimations on the whole Rubsteing and Goodenough dataset and
subsequentially compare outputs of existing similarity measures. We chose to
adopt the R&G dataset since others have worked on it, thus permitting direct
comparison of results obtained by different experiments.

Moreover, we want to show the suitability of an Information Content metric
that solely relies on the WordNet taxonomy, without relying on external
collection of texts.


How to participate:

In order to participate in the similarity experiment point your browser to:
http://grid.deis.unical.it/similarity/
Then by clicking on the register link you can register and immediately
receive a password via email.
After logging in you should indicate similarity values for all the word
pairs by using the Slider provided for each pair. The estimated time
required is about 10 minutes including time for registering.
Results of the experiment and the data will be published as soon as we
collect a significant amount of ratings. 
 


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An embedded message was scrubbed...
From: "Adam Kilgarriff" <adam at lexmasterclass.com>
Subject: Re: [Corpora-List] Call for Participation - Semantic	SimilarityExperiment
Date: Wed, 18 Jul 2007 08:18:48 +0100
Size: 8578
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070804/e6c66486/attachment.eml>


More information about the Corpora mailing list