24.194, FYI: Cross-Lingual Textual Entailment for Content Synchronization at SemEval 2013 - Task 8

Fri Jan 11 19:13:53 UTC 2013

LINGUIST List: Vol-24-194. Fri Jan 11 2013. ISSN: 1069 - 4875.

Subject: 24.194, FYI: Cross-Lingual Textual Entailment for Content Synchronization at SemEval 2013 - Task 8

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Brent Miller <brent at linguistlist.org>
================================================================  

Date: Fri, 11 Jan 2013 14:13:50
From: Danilo Giampiccolo [giampiccolo at celct.it]
Subject: Cross-Lingual Textual Entailment for Content Synchronization at SemEval 2013 - Task 8

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-194.html&submissionid=6582937&topicid=6&msgnumber=1

Apologies for cross-posting.
Please circulate to any potentially interested parties.

Second Call for Participation: Cross-Lingual Textual Entailment for Content
Synchronization (CLTE)
(SemEval-2013 Task 8)

CLTE website: http://www.cs.york.ac.uk/semeval-2013/task8/
CLTE discussion group: http://groups.google.com/group/clte-semeval

Following up the successful debut in 2012 [Negri et al., 2012], we are pleased
to invite participants to the second round of the Cross-Lingual Textual
Entailment task (CLTE) at SemEval 2013, co-located with the *SEM and
NAACL-2013 conferences.

CLTE addresses textual entailment (TE) recognition under the dimension of
cross-linguality, and within the challenging application scenario of content
synchronization. The great potential of integrating monolingual TE recognition
components into NLP architectures has been reported in several areas,
including question answering, information retrieval, information extraction,
and document summarization. However, mainly due to the absence of
cross-lingual TE (CLTE) recognition components, similar improvements have not
been achieved yet in any cross-lingual application. The CLTE task aims at
prompting research to fill this gap.

Content synchronization represents an ideal application scenario to test the
capabilities of advanced NLP systems. Given two documents about the same topic
written in different languages (e.g. Wikipedia articles), the task consists of
automatically detecting and resolving differences in the information they
provide, in order to produce aligned, mutually enriched versions of the two
documents. Towards this objective, a crucial requirement is to identify the
information in one page that is equivalent or novel (more informative) with
respect to the content of the other. The task can be naturally cast as an
entailment-related problem, where bidirectional and unidirectional entailment
judgments for two text fragments are respectively mapped into judgments about
semantic equivalence and novelty. Alternatively, the task can be seen as a
Machine Translation problem, where judgments about semantic equivalence and
novelty depend on the possibility to fully or partially translate a text
fragment into the other.

Task Description:

Given a pair of topically related text fragments (T1 and T2) in different
languages, the CLTE task consists of automatically annotating it with one of
the following entailment judgments:

- Bidirectional (T1 -> T2 & T1 <- T2): the two fragments entail each other
(semantic equivalence);
- Forward (T1 -> T2 & T1 !<- T2): unidirectional entailment from T1 to T2;
- Backward (T1 !-> T2 & T1 <- T2): unidirectional entailment from T2 to T1;
- No Entailment (T1 !-> T2 & T1 !<- T2): there is no entailment between T1 and
T2;

In this task, both T1 and T2 are assumed to be TRUE statements; hence in the
dataset there are no contradictory pairs.

Examples:

<entailment-corpus  languages=''spa-eng''>
          <pair id=“1” entailment=“bidirectional”>
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
                    <t2>Mozart was born in Salzburg.</t2>
          </pair>
          <pair id=“2” entailment=''forward”>
                   <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
<t2> Mozart was born in 1756 in the city of Salzburg.</t2>
          </pair>
          <pair id=“3” entailment=''backward”>
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
<t2>Mozart was born on 27th January 1756 in Salzburg.</t2>

          </pair>
          <pair id=“4” entailment=''no_entailment”>
                    <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
<t2>Mozart was born to Leopold and Anna Maria Pertl
Mozart.</t2>
          </pair>
</entailment-corpus>

Dataset:

The dataset consists of about 1,700 cross-lingual entailment pairs (1000 for
development -i.e. the CLTE 2012 Development and Test data-, and about 700 for
test), balanced with respect to the 4 entailment judgments (bidirectional,
forward, backward, and no entailment).

Datasets will be available for the following language combinations:
- English/Spanish
- English/German 
- English/French 
- English/Italian 

Evaluation:

System results will be compared to the human-annotated gold standard and the
metric used to evaluate system performances will be accuracy, i.e. the
proportion of correct judgments out of the total number of judgments returned
by the systems.

Accuracy figures will be provided for both the whole test set and for each of
the 4 entailment judgment categories taken separately.

Schedule:

- November 1, 2012: Full Training Data available for participants
- February 15, 2013: Registration Deadline [for Task Participants]
- March 1, 2013: Test data release
- March 8, 2013: Task submissions deadline
- March 15, 2013: Release of individual results
- April 9, 2013: Paper submission deadline [TBC]
- April 23, 2013: Reviews Due [TBC]
- May 4, 2013: Camera ready Due [TBC]
- June, 13-14 2013: SemEval 2013 Workshop (collocated at *SemEval and NAACL,
Atlanta, USA)

Task Organizers:

- Matteo Negri,  FBK-irst, Trento, Italy, negri [at] fbk.eu (CONTACT)
- Yashar Mehdad, The University of British Columbia, ymahdad [at] gmail.com
- Luisa Bentivogli, FBK-irst, Trento, Italy, bentivo [at] fbk.eu
- Danilo Giampiccolo, CELCT, Italy, giampiccolo [at] celct.it
- Alessandro Marchetti, CELCT, Italy, amarchetti [at] celct.it

References:

M. Negri,  A. Marchetti, Y. Mehdad, L. Bentivogli and D. Giampiccolo, 2012.
Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content
Synchronization. In Proceedings of *SEM 2012 (.pdf:
http://ixa2.si.ehu.es/starsem/proc/pdf/STARSEM-SEMEVAL053.pdf)

Links:

- CLTE Mailing list http://groups.google.com/group/clte-semeval
- CLTE task website: http://www.cs.york.ac.uk/semeval-2013/task8/
- SemEval 2013 website: http://www.cs.york.ac.uk/semeval-2013/
- SemEval discussion group: http://groups.google.com/group/semeval3
- NAACL 2013 website: http://naacl2013.naacl.org/

Linguistic Field(s): Computational Linguistics

Subject Language(s): English (eng)
                     French (fra)
                     German (deu)
                     Italian (ita)
                     Spanish (spa)

----------------------------------------------------------
LINGUIST List: Vol-24-194	
----------------------------------------------------------