[Corpora-List] Second Call for Participation: Cross-lingual Textual Entailment for Content Synchronization at SemEval-2012

Mon Dec 19 16:54:15 UTC 2011

_________________________________________________________________________________________________________

SECOND CALL FOR PARTICIPATION: Cross-lingual Textual Entailment for Content Synchronization at SemEval-2012 (CLTE at SemEval-2012)

UPDATE:

-    THE CLTE TRAINING SET AND THE EVALUATION SCRIPTS ARE NOW AVAILABLE!

-    FOR FURTHER INFORMATION ON HOW TO OBTAIN THEM PLEASE VISIT http://www.cs.york.ac.uk/semeval-2012/task8/index.php?id=data

--------

We invite participants to a new SemEval-2012 task: Cross-lingual Textual Entailment (CLTE) for Content Synchronization.

http://www.cs.york.ac.uk/semeval-2012/task8/

Given a pair of topically related text fragments (T1 and T2) in different languages, the CLTE task consists of automatically annotating it with one of the following entailment judgments:

- Bidirectional (T1 entails T2; T2 entails T1)

- Forward (T1 entails T2; T2 does not entail T1)

- Backward (T1 does not entail T2; T2 entails T1)

- No Entailment (T1 does not entail T2; T2 does not entail T1)

Datasets are available for the following language combinations:

- Spanish/English

- German/English

- Italian/English

- French/English

The CLTE task  addresses textual entailment recognition under a new dimension (cross-linguality), and within a new challenging application scenario (content synchronization).

Cross-linguality represents a dimension of the TE recognition problem that so far has been only partially investigated. The great potential of integrating monolingual TE recognition components into NLP architectures has been reported in several areas, including question answering, information retrieval, information extraction, and document summarization. However, mainly due to the absence of CLTE recognition components, similar improvements have not been achieved yet in any cross-lingual application. The CLTE task aims at prompting research to fill this gap.

Content synchronization represents a challenging application scenario to test the capabilities of advanced NLP systems. Given two documents about the same topic written in different languages (e.g. Wikipedia articles), the task consists of automatically detecting and resolving differences in the information they provide, in order to produce aligned, mutually enriched versions of the two documents. Towards this ambitious objective, a crucial requirement is to identify the information in one page that is equivalent or novel (more informative) with respect to the content of the other. The task can be naturally cast as an entailment-related problem, where bidirectional and unidirectional entailment judgments for two text fragments are respectively mapped into judgments about semantic equivalence and novelty. Alternatively, the task can be seen as a Machine Translation evaluation problem, where judgments about semantic equivalence and novelty relate to the possibility that one text fragment is the full or partial translation of the other.

The Task Guidelines are available at: http://www.cs.york.ac.uk/semeval-2012/task8/index.php?id=guidelines.

Proposed schedule:

* September 1, 2011: Trial Dataset released (40 English/Spanish pairs)

* December 16, 2011: Training data + test scripts release

* February 10, 2012: Test data release

* February 20, 2012: Task submissions deadline

* March 1, 2012: Release of individual results

* March 10, 2012: Systems' reports due to organizers

* March 25, 2012: Papers' review due to participants

* April 1, 20121: Camera Ready deadline

If you are interested in the task, please join the discussion group http://groups.google.com/group/clte-semeval.

Best regards,

The CLTE track organizers

Matteo Negri, Yashar Mehdad, Luisa Bentivogli (FBK-irst, Trento, Italy)

Danilo Giampiccolo, Alessandro Marchetti (CELCT, Trento, Italy)

_________________________________________________________________________________________

Danilo Giampiccolo       CELCT - Center for the Evaluation of Language and Communication Technologies
+39 0461 314  874           via alla Cascata 56/c
www.celct.it                    38123 POVO TN Italy
_________________________________________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111219/0224d549/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora