[Corpora-List] SEMEVAL 2014 Task 1: SICK-er SECOND CALL FOR PARTICIPATION

Thu Jan 23 08:47:26 UTC 2014

SEMEVAL 2014 Task 1: SECOND CALL FOR PARTICIPATION

Evaluation of compositional distributional semantic models on full
sentences through semantic relatedness and textual entailment

http://alt.qcri.org/semeval2014/task1/

INTRODUCTION

Distributional Semantic Models (DSMs) approximate the meaning of words
with vectors summarizing their patterns of co-occurrence in
corpora. Recently, several compositional extensions of DSMs
(Compositional DSMs, or CDSMs) have been proposed, with the purpose of
representing the meaning of phrases and sentences by composing the
distributional representations of the words they contain. Despite the
ever increasing interest in the field, the development of adequate
benchmarks for CDSMs, especially at the sentence level, is still
lagging behind.

SICK (Sentences Involving Compositional Knowledge) is an English data
set including 10,000 sentence pairs that are rich in the lexical,
syntactic and semantic phenomena that CDSMs are expected to account
for (e.g., contextual synonymy and other lexical variation phenomena,
active/passive and other syntactic alternations, word order effects,
impact of negation, determiners and other grammatical elements), but
do not require dealing with other aspects of existing sentential data
sets (complex tokenization issues, idiomatic multiword expressions,
named entities, telegraphic language) that are not within the scope of
current compositional distributional semantics.  Sentence pairs were
built starting from
http://nlp.cs.illinois.edu/HockenmaierGroup/data.html and
http://www.cs.york.ac.uk/semeval-2012/task6/index.php?id=data, and
have been annotated for relatedness in meaning and entailment relation
between the two elements. The sentence relatedness score provides a
direct way to evaluate CDSMs, insofar as their outputs are meant to
quantify the degree of semantic relatedness between sentences. On the
other hand, detecting the presence of entailment is one of the
traditional benchmarks of a successful semantic system: CDSMs are thus
expected to predict, to a certain extent, also entailment judgments.

TASK DESCRIPTION

The challenge involves two sub-tasks:

a) predicting the degree of relatedness between two sentences

b) detecting the entailment relation holding between them

Participants can submit system runs for one or both sub-tasks.

While we especially encourage developers of CDSMs to test their
methods on SICK, developers of other kinds of systems that can tackle
sentence relatedness or entailment tasks (e.g., full-fledged RTE
systems) are also welcome to submit their output.

IMPORTANT DATES

- Evaluation period March 15-30, 2014

- Paper submission due April 30, 2014

- SemEval workshop August 23-24, 2014, co-located with COLING and *SEM
  in Dublin, Ireland.

CONTACTS

Further details about the task, the dataset and the evaluation
criteria can be found on the SemEval website, where trial and train
data can also be downloaded: http://alt.qcri.org/semeval2014/task1/

If you are interested in participating, join our mailing list:
https://groups.google.com/forum/#!forum/cdsm-semeval

ORGANIZERS

Marco Marelli, University of Trento, Italy
Stefano Menini, Fondazione Bruno Kessler, Italy
Marco Baroni, University of Trento, Italy
Luisa Bentivogli, Fondazione Bruno Kessler, Italy
Raffaella Bernardi, University of Trento, Italy
Roberto Zamparelli, University of Trento, Italy

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora