[Corpora-List] Call for Participation: CROSS-LINGUAL TEXTUAL ENTAILMENT FOR CONTENT SYNCHRONIZATION (CLTE) - [SemEval-2013 Task 8]

Fri Nov 23 10:38:02 UTC 2012

Please distribute widely - Apologies for cross-posting

*******************************************

CALL FOR PARTICIPATION: CROSS-LINGUAL TEXTUAL ENTAILMENT FOR CONTENT SYNCHRONIZATION (CLTE)
(SemEval-2013 Task 8)

CLTE website: http://www.cs.york.ac.uk/semeval-2013/task8/
CLTE discussion group: http://groups.google.com/group/clte-semeval

Following up the successful debut in 2012 [Negri et al., 2012], we are pleased to invite participants to the second round of the Cross-Lingual Textual Entailment task (CLTE) at SemEval 2013, co-located with the *SEM and NAACL-2013 conferences.

CLTE addresses textual entailment (TE) recognition under the dimension of cross-linguality, and within the challenging application scenario of content synchronization. The great potential of integrating monolingual TE recognition components into NLP architectures has been reported in several areas, including question answering, information retrieval, information extraction, and document summarization. However, mainly due to the absence of cross-lingual TE (CLTE) recognition components, similar improvements have not been achieved yet in any cross-lingual application. The CLTE task aims at prompting research to fill this gap.

Content synchronization represents an ideal application scenario to test the capabilities of advanced NLP systems. Given two documents about the same topic written in different languages (e.g. Wikipedia articles), the task consists of automatically detecting and resolving differences in the information they provide, in order to produce aligned, mutually enriched versions of the two documents. Towards this objective, a crucial requirement is to identify the information in one page that is equivalent or novel (more informative) with respect to the content of the other. The task can be naturally cast as an entailment-related problem, where bidirectional and unidirectional entailment judgments for two text fragments are respectively mapped into judgments about semantic equivalence and novelty. Alternatively, the task can be seen as a Machine Translation problem, where judgments about semantic equivalence and novelty depend on the possibility to fully or partially translate a text fragment into the other.

----------------------------
TASK DESCRIPTION
----------------------------
Given a pair of topically related text fragments (T1 and T2) in different languages, the CLTE task consists of automatically annotating it with one of the following entailment judgments:

- Bidirectional (T1 -> T2 & T1 <- T2): the two fragments entail each other (semantic equivalence);
- Forward (T1 -> T2 & T1 !<- T2): unidirectional entailment from T1 to T2;
- Backward (T1 !-> T2 & T1 <- T2): unidirectional entailment from T2 to T1;
- No Entailment (T1 !-> T2 & T1 !<- T2): there is no entailment between T1 and T2;

In this task, both T1 and T2 are assumed to be TRUE statements; hence in the dataset there are no contradictory pairs.

Examples:

<entailment-corpus  languages="spa-eng">
          <pair id="1" entailment="bidirectional">
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
                    <t2>Mozart was born in Salzburg.</t2>
          </pair>
          <pair id="2" entailment="forward">
                   <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
                    <t2> Mozart was born in 1756 in the city of Salzburg.</t2>
          </pair>
          <pair id="3" entailment="backward">
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
                    <t2>Mozart was born on 27th January 1756 in Salzburg.</t2>                   
          </pair>
          <pair id="4" entailment="no_entailment">
                    <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
                    <t2>Mozart was born to Leopold and Anna Maria Pertl Mozart.</t2>
          </pair>
</entailment-corpus>

----------------------------
DATASET
----------------------------
The dataset consists of about 1,700 cross-lingual entailment pairs (1000 for development -i.e. the CLTE 2012 Development and Test data-, and about 700 for test), balanced with respect to the 4 entailment judgments (bidirectional, forward, backward, and no entailment).

Datasets will be available for the following language combinations:
- English/Spanish
- English/German 
- English/French 
- English/Italian 

----------------------------
EVALUATION
----------------------------
System results will be compared to the human-annotated gold standard and the metric used to evaluate system performances will be accuracy, i.e. the proportion of correct judgments out of the total number of judgments returned by the systems.

Accuracy figures will be provided for both the whole test set and for each of the 4 entailment judgment categories taken separately.

----------------------------
PROPOSED SCHEDULE
----------------------------
November 1, 2012:   Full Training Data available for participants
February 15, 2013:  Registration Deadline [for Task Participants]
March 1, 2013:      Test data release
March 8, 2013:      Task submissions deadline
March 15, 2013:     Release of individual results
April 9, 2013:      Paper submission deadline [TBC]
April 23, 2013:     Reviews Due [TBC]
May 4, 2013:        Camera ready Due [TBC]
June, 13-14 2013:   SemEval 2013 Workshop (collocated at *SemEval and NAACL, Atlanta, USA)

----------------------------
TASK ORGANIZERS
----------------------------
Matteo Negri,  FBK-irst, Trento, Italy, negri [at] fbk.eu (CONTACT)
Yashar Mehdad, The University of British Columbia, ymahdad [at] gmail.com
Luisa Bentivogli, FBK-irst, Trento, Italy, bentivo [at] fbk.eu
Danilo Giampiccolo, CELCT, Italy, giampiccolo [at] celct.it
Alessandro Marchetti, CELCT, Italy, amarchetti [at] celct.it

----------------------------
REFERENCES
----------------------------

M. Negri,  A. Marchetti, Y. Mehdad, L. Bentivogli and D. Giampiccolo, 2012. Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization. In Proceedings of *SEM 2012 (.pdf: http://ixa2.si.ehu.es/starsem/proc/pdf/STARSEM-SEMEVAL053.pdf)

----------------------------
LINKS
----------------------------
- CLTE Mailing list http://groups.google.com/group/clte-semeval
- CLTE task website: http://www.cs.york.ac.uk/semeval-2013/task8/
- SemEval 2013 website: http://www.cs.york.ac.uk/semeval-2013/
- SemEval discussion group: http://groups.google.com/group/semeval3
- NAACL 2013 website: http://naacl2013.naacl.org/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora