21.1827, FYI: 6th Recognizing Textual Entailment Challenge

Thu Apr 15 17:00:24 UTC 2010

LINGUIST List: Vol-21-1827. Thu Apr 15 2010. ISSN: 1068 - 4875.

Subject: 21.1827, FYI: 6th Recognizing Textual Entailment Challenge

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Monica Macaulay, U of Wisconsin-Madison  
Eric Raimy, U of Wisconsin-Madison  
Joseph Salmons, U of Wisconsin-Madison  
Anja Wanner, U of Wisconsin-Madison  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Elyssa Winzeler <elyssa at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.

===========================Directory==============================  

1)
Date: 13-Apr-2010
From: Danilo Giampiccolo < giampiccolo at celct.it >
Subject: 6th Recognizing Textual Entailment Challenge

-------------------------Message 1 ---------------------------------- 
Date: Thu, 15 Apr 2010 12:57:34
From: Danilo Giampiccolo [giampiccolo at celct.it]
Subject: 6th Recognizing Textual Entailment Challenge

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=21-1827.html&submissionid=2626054&topicid=6&msgnumber=1

Sixth Recognizing Textual Entailment Challenge at TAC 2010

http://www.nist.gov/tac/2010/RTE/

The Recognizing Textual Entailment (RTE) task consists of developing a
system that, given two text fragments, can determine whether the meaning of
one text is entailed, i.e. can be inferred, from the other text. 

Since its inception in 2005, RTE has enjoyed a constantly growing
popularity in the NLP community. After the first three highly successful
PASCAL RTE Challenges campaigns held in Europe, in 2008 RTE became a track
at the Text Analysis Conference (TAC), bringing it together with
communities working on NLP applications. The interaction has provided the
opportunity to apply RTE to specific application settings and move it
towards more realistic scenarios. In particular, the RTE-5 Pilot Search
Task represented a step forward, as for the first time textual entailment
recognition was performed on a corpus, instead of isolated H-T pairs, and
on a real NLP application, namely Summarization. 

Encouraged by the positive response obtained so far, the RTE Organizing
Committee is glad to launch the Sixth Recognizing Textual Entailment
Challenge, proposed for the third year as a track of TAC.

Organizations interested in participating in the RTE-6 Challenge are
invited to submit a track registration form by May 21, 2010, at the TAC
2010 web site: 

http://www.nist.gov/tac/2010/

What is new in RTE-6

1) RTE-6 does not include the traditional RTE Main Task which was carried
out in the first five RTE challenges, i.e. there will be no task to make
entailment judgments over isolated T-H pairs drawn from multiple applications.

2) A new Main Task based on only the Summarization application setting is
proposed, together with a subtask:

- Main Task: Recognizing Textual Entailment within a Corpus. 
A close variant of the Pilot Search Task in RTE-5, the RTE-6 Main Task
differs significantly in two ways:

* Unlike in RTE-5, where the Search Task was performed on the whole corpus,
in RTE-6 a preliminary Information Retrieval filtering phase is performed
using Lucene, in order to select for each H a subset of candidate entailing
sentences to be judged by the participating systems.
* In the RTE-6 data set some of the H's have no entailing sentences.

- Novelty Detection subtask. This task has the same structure as the Main
Task, but it is separated out as a subtask to allow participants to
optimize their RTE engines for detecting novelty, i.e. judging whether the
information contained in each H is novel with respect to the information
contained in the corpus. A novel H is defined as one that has no entailing
sentences in the set of candidate T's. Systems' outputs will have the same
format as for the Main Task but will be specifically scored using metrics
designed for assessing novelty detection.

3) A KBP Validation Pilot, set in the Knowledge Base Population scenario,
is also proposed. 

4) The exploratory effort on resource evaluation will be extended also to
tools. Mandatory ablation tests for both knowledge resources and tools will
be required to participants in the new RTE-6 Main Task.

RTE-6 Main Task - Recognizing Textual Entailment within a Corpus

In the RTE-6 Main Task given a corpus, a hypothesis H, and a set of
'candidate' entailing sentences for that H retrieved by Lucene from the
corpus, RTE systems are required to identify all the sentences that entail
H among the candidate sentences. 

The RTE6-Main data set is based on the data created for the TAC 2009 Update
Summarization task, consisting of a number of topics, each containing two
sets of documents, namely i) Cluster A, made up of the first 10 texts in
chronological order of publication date, and ii) Cluster B, made up of the
last 10 texts. H's are standalone sentences taken from Cluster B documents,
meanwhile candidate entailing sentences (T's) are the 100 top-ranked
sentences retrieved for each H by Lucene from the Cluster A corpus, using H
verbatim as the search query. While only the subset of the candidate
entailing sentences must be judged for entailment, these sentences are not
to be considered as isolated texts, but the entire Cluster A corpus, to
which the candidate entailing sentences belong, is to be taken into
consideration in order to resolve discourse references and appropriately
judge the entailment relation.

The example below presents a hypothesis referring to a given topic and some
of the entailing sentences found in the subset of candidate sentences (the
first entailing sentence entails H because 'new hurricane' can be seen to
resolve to 'Hurricane Rita' from the context in which it occurs in its
Cluster A document):

<H_sentence> Rita barreled toward the Gulf of Mexico.</H_sentence>
   <text doc_id=''AFP_ENG_20050920.0413'' s_id=''1'' s_id=''YES''>World oil
prices fell further on Tuesday, despite a new hurricane powering towards
oil facilities in the Gulf of Mexico, and as OPEC pledged to supply more
crude from the start of October if required.</text>
   <text doc_id=''AFP_ENG_20050920.0614'' s_id=''11''
s_id=''YES''>Hurricane Rita barreled near southern Florida islands and
headed toward the Gulf of Mexico, threatening Texas and Louisiana with
winds of 160 kilometers per hour (100 mph).</text>
   <text doc_id=''AFP_ENG_20050920.0664'' s_id=''4'' s_id=''YES''>Hurricane
Rita pounded the fragile Florida Keys islands Tuesday as it barreled toward
the oil-rich Gulf of Mexico.</text>

RTE-6 Novelty Detection Subtask

The Novelty Detection subtask is based on the Main Task and is aimed at
specifically addressing the interests of the Summarization community, in
particular with regard to the Update Summarization task, focusing on
detection of novelty in Cluster B documents. 

The task consists of judging if the information contained in each H (drawn
from the cluster B documents) is novel with respect to the information
contained in the set of Cluster A candidate entailing sentences. If for a
given H one or more entailing sentences are found, it means that the
content of the H is not new. On the contrary, if no entailing sentences are
detected, it means that the information contained in the H is regarded as
novel.

The Novelty Detection Task requires the same output format as the Main Task
- i.e. no additional type of decision is needed. Nevertheless, the Novelty
Detection Task differs from the Main Task in the following  ways: 

1) The H's are only on a subset of the H's used for the Main Task;
2) The system outputs are scored differently, using specific scoring
metrics designed for assessing novelty detection.

The Main and Novelty Detection Task guidelines for participants, together
with one sample topic taken from the Development Set, are available at the
RTE-6 Website (http://www.nist.gov/tac/2010/RTE/).

RTE-6 KBP Validation Pilot Task

Based on the TAC Knowledge Base Population (KBP) Slot-Filling task, the new
KBP validation pilot task is to determine whether a given relation
(Hypothesis) is supported in an associated document (Text). Each slot fill
that is proposed by a system for the KBP Slot-Filling task would create one
evaluation item for the RTE-KBP Validation Pilot: the Hypothesis would be a
simple sentence created from the slot fill, while the Text would be the
source document that was cited as supporting the slot fill.

The guidelines and the Development Set will be available by the end of
April 2010 at the RTE-6 website (http://www.nist.gov/tac/2010/RTE/).

Resource and Tool Evaluation through Ablation Tests

The exploratory effort on resource evaluation started in RTE-5 will
continue on the new RTE-6 Main Task and will be extended to tools. Ablation
tests are required for systems participating in the new RTE-6 Main Task, in
order to collect data to better understand the impact of both knowledge
resources and tools used by RTE systems and evaluate their contribution to
systems' performance. An ablation test consists of removing one module from
a complete system, and rerunning the system on the test set with the other
modules (excluding the module being tested). Comparing the results to those
obtained by the complete system, it is possible to assess the practical
contribution given by the individual module. 

The RTE Resource Pool at ACLwiki

http://www.aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool)

The RTE Resource Pool, set up for the first time during RTE-3, serves as a
portal and forum for publicizing and tracking resources, and reporting on
their use. All the RTE participants and other members of the NLP community
who develop or use relevant resources are encouraged to contribute to this
important resource.

The RTE Resource Pool has been  updated with a section specifically
dedicated to knowledge resources. The new page
(http://www.aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool#Knowledge_Resources
) contains a list of the 'standard' RTE resources, which have been selected
and exploited majorly in the design of RTE systems during the RTE
challenges held so far, together with the links to the locations where they
are made available. Furthermore, the results of the ablation tests carried
out in RTE-5, and their description, is also provided.

Tentative Schedule

April 23	KBP Validation Pilot: Release of Development Set
April 30	Main Task: Release of Development Set
May 21	Deadline for TAC 2010 track registration
September 2	Main Task: Release of Test Set
September 9	Main Task: Deadline for task submissions
September 10	KBP Validation Pilot: Release of Test Set
September 16	Main Task: Release of individual evaluated results
September 17	KBP Validation Pilot: Deadline for task submissions
September 24	Main Task: Deadline for ablation tests submissions
September 24	KBP Validation Pilot: Release of individual evaluated results
September 26	Deadline for TAC 2010 workshop presentation proposals
October 1	Main Task: Release of individual ablation test results
October 20	Deadline for systems' reports

Track Coordinators and Organizers

Luisa Bentivogli, CELCT and FBK, Italy (Track coordinator, bentivo at fbk.eu) 
Danilo Giampiccolo, CELCT, Italy (Track coordinator, giampiccolo at celct.it)
Hoa Trang Dang, NIST, USA
Ido Dagan, Bar Ilan University, Israel
Peter Clark, Boeing, USA 

Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-21-1827