[Corpora-List] SemEval-2010 Task #8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals -- Call for Participation
Preslav Nakov (GMail)
preslavn at gmail.com
Wed Feb 17 08:56:03 UTC 2010
*******************************************************
Call for Participation
SemEval-2010 Shared Task #8:
Multi-Way Classification of Semantic Relations
Between Pairs of Nominals
http://docs.google.com/View?docid=dfvxd49s_36c28v9pmw
--- Trial data available ---
*******************************************************
This shared task should be of interest to researchers working on
* semantic relation extraction
* information extraction
* lexical semantics
============
Background
============
Recently, the NLP community has shown a renewed interest in deeper semantic
analysis, including automatic recognition of semantic relations between
pairs of words. This is an important task with many potential applications
in Information Retrieval, Information Extraction, Text Summarization,
Machine Translation, Question Answering, Paraphrasing, Recognizing Textual
Entailment, Thesaurus Construction, Semantic Network Construction, Word
Sense Disambiguation, and Language Modelling.
Despite this interest, progress was slow due to the incompatibility of the
different classification schemes proposed and used, which made it difficult
to compare the various classification algorithms. Most of the datasets used
so far provided no context for the target relation, thus relying on the
assumption that semantic relations are largely context-independent, which is
not a realistic assumption. A notable exception is SemEval-2007 Task 4:
Classification of Semantic Relations between Nominals, which for the first
time provided a standard benchmark dataset for seven semantic relations *in
context*. However, in order to avoid the challenge of defining a single
unified standard classification scheme, this dataset treated each semantic
relation separately, as a single two-class (positive vs. negative)
classification task, rather than as multi-way classification. While some
subsequent publications tried to use the dataset in a multi-way setup,
it was not designed to be used in that manner.
We believe that having a freely available standard benchmark dataset for
*multi-way* semantic relation classification *in context* is much needed for
the overall advancement of the field. Thus, we have posed as our primary
objective the challenging task of preparing and releasing such a dataset to
the research community. We further set up a common evaluation task that will
enable researchers to compare their algorithms.
==========
The Task
==========
Task: Given a sentence and two annotated nominals, choose the most suitable
relation from the following inventory of nine relations:
* Relation 1 (Cause-Effect)
* Relation 2 (Instrument-Agency)
* Relation 3 (Product-Producer)
* Relation 4 (Content-Container)
* Relation 5 (Entity-Origin)
* Relation 6 (Entity-Destination)
* Relation 7 (Component-Whole)
* Relation 8 (Member-Collection)
* Relation 9 (Message-Topic)
It is also possible to choose Other if none of the nine relations appears to
be suitable.
Example: The best choice for the following sentence would be
Component-Whole(e1,e2):
"The <e1>macadamia nuts</e1> in the <e2>cake</e2> also make it necessary to
have a very sharp knife to cut through the cake neatly."
Note that in the above sentence, Component-Whole(e1,e2) holds, but
Component-Whole(e2,e1) does not, i.e., we have Other(e2,e1). Thus, the task
asks for determining *both* the relation and the order of e1 and e2 as its
arguments.
==========
Datasets
==========
* Trial Dataset: A trial dataset has been released on August 30, 2009; it
contains data for the first five above-mentioned relations. However, there
are some references to the other four relations, which can be considered as
Other when experimenting with the trial dataset.
* Training Dataset: The training dataset consists of about 700 examples for
each of the nine relations and for the additional Other relation; a total of
about 7,000 examples.
* Development Dataset: The development dataset consists of about 100
examples for each of the nine relations and for the additional Other
relation; a total of about 1,000 examples.
* Test Dataset: The test dataset contains about 200 examples for each of the
nine relations and for the additional Other relation; a total of about 2,000
examples.
License: All data are released under the Creative Commons Attribution 3.0
Unported license.
===============
Time Schedule
===============
* Trial data released: August 30, 2009
* Training+development data release: February 26, 2010
* Test data release: March 18, 2010
* Result submission deadline: 7 days after downloading the *test*
data, but no later than April 2
* Organizers send the test results: April 10, 2010
* Submission of description papers: April 17, 2010
* Notification of acceptance: May 6, 2010
* SemEval'2010 workshop (at ACL): July 15-16, 2010
=================
Task Organizers
=================
Iris Hendrickx University of Lisbon, University of Antwerp
Su Nam Kim University of Melbourne
Zornitsa Kozareva University of Southern California, Information
Sciences Institute
Preslav Nakov National University of Singapore
Diarmuid Ó Séaghdha University of Cambridge
Sebastian Padó Stuttgart University
Marco Pennacchiotti Saarland University, Yahoo! Research
Lorenza Romano FBK-irst, Italy
Stan Szpakowicz University of Ottawa
==============
Useful Links
==============
Interested in participating in the shared task? Please join the following
Google group:
http://groups.google.com.sg/group/semeval-2010-multi-way-classification-of-s
emantic-relations?hl=en
Task #8 website: http://docs.google.com/View?docid=dfvxd49s_36c28v9pmw
SemEval 2010 website: http://semeval2.fbk.eu/semeval2.php
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list