[Corpora-List] SemEval Task #8 (Semantic Relations): Call for participation

Preslav Nakov (GMail) preslavn at gmail.com
Sat Mar 6 12:21:33 UTC 2010


Apologies for cross-posting.

*******************************************************

              Second Call for Participation

           SemEval-2010 Shared Task #8:
   Multi-Way Classification of Semantic Relations
            Between Pairs of Nominals

http://docs.google.com/View?docid=dfvxd49s_36c28v9pmw

          --- *Training* data available ---

*******************************************************

This shared task should be of interest to researchers working on
 * semantic relation extraction
 * information extraction
 * lexical semantics


============
 Background
============

Recently, the NLP community has shown a renewed interest in deeper semantic
analysis, including automatic recognition of semantic relations between
pairs of words. This is an important task with many potential applications
in Information Retrieval, Information Extraction, Text Summarization,
Machine Translation, Question Answering, Paraphrasing, Recognizing Textual
Entailment, Thesaurus Construction, Semantic Network Construction, Word
Sense Disambiguation, and Language Modelling.

Despite this interest, progress was slow due to the incompatibility of the
different classification schemes proposed and used, which made it difficult
to compare the various classification algorithms. Most of the datasets used
so far provided no context for the target relation, thus relying on the
assumption that semantic relations are largely context-independent, which is
not a realistic assumption. A notable exception is SemEval-2007 Task 4:
Classification of Semantic Relations between Nominals, which for the first
time provided a standard benchmark dataset for seven semantic relations *in
context*. However, in order to avoid the challenge of defining a single
unified standard classification scheme, this dataset treated each semantic
relation separately, as a single two-class (positive vs. negative)
classification task, rather than as multi-way classification. While some
subsequent publications tried to use the dataset in a multi-way setup,
it was not designed to be used in that manner.

We believe that having a freely available standard benchmark dataset for
*multi-way* semantic relation classification *in context* is much needed for
the overall advancement of the field. Thus, we have posed as our primary
objective the challenging task of preparing and releasing such a dataset to
the research community. We further set up a common evaluation task that will
enable researchers to compare their algorithms.


==========
 The Task
==========

Task: Given a sentence and two annotated nominals, choose the most suitable
relation from the following inventory of nine relations:

    * Relation 1 (Cause-Effect)       
    * Relation 2 (Instrument-Agency)
    * Relation 3 (Product-Producer)
    * Relation 4 (Content-Container)
    * Relation 5 (Entity-Origin)
    * Relation 6 (Entity-Destination)
    * Relation 7 (Component-Whole)
    * Relation 8 (Member-Collection)
    * Relation 9 (Message-Topic)

It is also possible to choose Other if none of the nine relations appears to
be suitable.


Example: The best choice for the following sentence would be
Component-Whole(e1,e2):

"The <e1>macadamia nuts</e1> in the <e2>cake</e2> also make it necessary to
have a very sharp knife to cut through the cake neatly."

Note that in the above sentence, Component-Whole(e1,e2) holds, but
Component-Whole(e2,e1) does not, i.e., we have Other(e2,e1). Thus, the task
asks for determining *both* the relation and the order of e1 and e2 as its
arguments.


==========
 Datasets
==========

* Training Dataset: The training dataset consists of a total of 8,000
examples.

* Test Dataset: The test dataset consists of over 2,717 examples; it will be
released on March 18.

License: All data are released under the Creative Commons Attribution 3.0
Unported license.


===============
 Time Schedule
===============

* Trial data released:               August 30, 2009
* Training data released:            March 5, 2010
* Test data release:                 March 18, 2010
* Result submission deadline:        7 days after downloading the *test*
data, but no later than April 2

* Organizers send the test results:  April 10, 2010
* Submission of description papers:  April 17, 2010
* Notification of acceptance:        May 6, 2010
* SemEval'2010 workshop (at ACL):    July 15-16, 2010


=================
 Task Organizers
=================

Iris Hendrickx        University of Lisbon, University of Antwerp
Su Nam Kim            University of Melbourne
Zornitsa Kozareva     University of Southern California, Information
Sciences Institute
Preslav Nakov         National University of Singapore
Diarmuid Ó Séaghdha   University of Cambridge
Sebastian Padó        Stuttgart University
Marco Pennacchiotti   Saarland University, Yahoo! Research
Lorenza Romano        FBK-irst, Italy
Stan Szpakowicz       University of Ottawa


==============
 Useful Links
==============

Interested in participating in the shared task? Please join the following
Google group:
http://groups.google.com.sg/group/semeval-2010-multi-way-classification-of-s
emantic-relations?hl=en

Task #8 website: http://docs.google.com/View?docid=dfvxd49s_36c28v9pmw

SemEval 2010 website: http://semeval2.fbk.eu/semeval2.php


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list