17.3066, FYI: RTE 3 Preliminary Announcement

Wed Oct 18 14:52:33 UTC 2006

LINGUIST List: Vol-17-3066. Wed Oct 18 2006. ISSN: 1068 - 4875.

Subject: 17.3066, FYI: RTE 3 Preliminary Announcement

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Amy Renaud <renaud at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 18-Oct-2006
From: Danilo Giampiccolo < info at celct.it >
Subject: RTE 3 Preliminary Announcement 

-------------------------Message 1 ---------------------------------- 
Date: Wed, 18 Oct 2006 10:50:44
From: Danilo Giampiccolo < info at celct.it >
Subject: RTE 3 Preliminary Announcement 

Apologies for cross-postings

3RD PASCAL TEXTUAL ENTAILMENT CHALLENGE AND RESOURCES POOL
PRELIMINARY ANNOUNCEMENT
(http://www.pascal-network.org/Challenges/RTE3/)

INTRODUCTION 

Encouraged by the success of the two previous rounds of the Recognizing
Textual Entailment (RTE) challenge (more details to be found at the RTE
websites, http://www.pascal-network.org/Challenges/RTE/;
http://www.pascal-network.org/Challenges/RTE2/), the RTE organizing
committee would like to announce the 3rd round of the PASCAL Recognizing
Textual Entailment (RTE) Challenge. 

RTE has been proposed as a generic empirical framework for evaluating
semantic inference in an application independent manner. The goal of the
first RTE has proven to be of great interest and the community's response
encouraged us to gradually extend its scope. In the 2nd RTE Challenge 23
participating groups presented their work at the PASCAL Challenges Workshop
in April 2006 in Venice. The event was successful and the number of
participants and their contributions demonstrated that Textual Entailment
is a quickly growing field of NLP research. Already, the workshops have
spawned a large number of publications in major conferences, with more work
in progress (see RTE-2 website for a comprehensive reference list). 

RTE 3 HIGHLIGHTS: WHAT IS NEW IN THE NEXT CHALLENGE

RTE 3 will follow the same structure of the previous campaign, to
facilitate the participation of newcomers and to allow assessing
improvements of earlier systems. Nevertheless, the following innovations
will be introduced to extend the challenge:

+ A limited number of longer texts - i.e. one or two paragraphs long - will
be introduced as a first step towards addressing broader settings which
require discourse analysis. 

+ An RTE Resource Pool has been created as a shared central repository and
evaluation forum for resource contributors and users (see details below). 

+ A tentative proposed dataset based on the results from the Answer
Validation Exercise in the QA track at CLEF 2006, as a new pilot task. 

TASKS AND DATA DESCRIPTION

RTE is the task of recognizing that the meaning of one text is entailed
(can be inferred) by another. The input to the challenge task consists of
pairs of text units, termed T(ext) - the entailing text, and H(ypothesis) -
the entailed text. The task consists in recognizing a directional relation
between the two text fragments, deciding whether T entails H. More
specifically, we say that T entails H if, typically, a human reading T
would infer that H is most likely true. System results will be compared to
a human-annotated gold-standard test corpus. 

The following H/T pairs exemplify the task proposed in the challenge:

T: Dr. George Carlo, an epidemiologist, asserts that medical science
indicates increased risks of tumors, cancer, genetic damage and other
health problems from the use of cell phones.
H: Cell phones pose health risks.
(TRUE)

T: The available scientific reports do not show that any health problems
are associated with the use of wireless phones.
H: Cell phones pose health risks
(FALSE)

T: Exposure therapy is the main therapy used for treating agoraphobia. As
agoraphobic problems tend to be more widespread, treatment can take longer
- from three to six months.
H: Agoraphobia is a widespread disorder. 
(TRUE)

T: With agoraphobia there is widespread avoidance and restriction of
activities and places.
H: Agoraphobia is a widespread disorder.
(FALSE)

The test and development data sets will be based on multiple data sources
and are intended to be representative of typical problems encountered by
applied systems. Examples will be a mixture of pairs that could/could not 
be successfully handled by existing systems. As in RTE-2, data types
corresponding to the following application areas will be used (see the
RTE-3 website for more detail):

MAIN TASK

Question Answering (QA):
Simulating a QA scenario in which the hypothesized answer has to be
inferred from the candidate text passage.

''Propositional'' Information Retrieval (IR):
Propositional queries (e.g. ''Women are poorly represented in Parliament'')
from IR evaluation datasets are chosen as hypotheses, and (correct and
incorrect) sentences retrieved by IR systems are proposed as texts.

Information Extraction/Relation Extraction (IE):
Existing systems will be trained on several IE-style relations, and
positive and negative examples from the system's output will be picked to
generate T-H pairs.

Summarization (SUM):
Using the output of multi-document text summarization systems, sentence
pairs that have high content overlap are converted into T-H pairs. We also
plan to exploit the Pyramid method introduced as an evaluation methodology
in the DUC 2005 competition.

ANSWER VALIDATION PILOT TASK

We are tentatively planning to introduce a pilot task where T/H pairs are
taken from system results from the Answer Validation Exercise in the QA
track at CLEF 2006. This data is contributed by UNED (Universidad Nacional
de Educación a Distancia).

This year, the aim will be to include a limited proportion of longer texts
-one or two paragraphs long- moving toward more comprehensive scenarios
which require discourse analysis. 

THE RTE RESOURCE POOL AT NLPZONE.ORG

One of the key conclusions at the 2nd RTE Challenge Workshop was that
entailment modeling requires vast knowledge resources that correspond to
different types of entailment reasoning. Examples of useful knowledge
include ontological and lexical relationships, paraphrases and entailment
rules, meaning entailing syntactic transformations and certain types of
world knowledge. Textual entailment systems also utilize general NLP tools
such as POS taggers, parsers and named-entity recognizers, sometimes posing
specialized requirements to such tools. With so many resources being
continuously released and improved, it can be difficult to know which
particular resource to use. 

In response, RTE-3 will include a new activity for building an RTE Resource
Pool, which will serve as a portal and forum for publicizing and tracking
resources, and reporting on their use. We actively solicit both RTE
participants and other members of the NLP community who develop or use
relevant resources to contribute to the RTE Resource Pool. Contributions
include links and descriptions of relevant resources as well as
informational postings regarding resource use and accumulated experience.
Utilized resources will be cited and evaluated by the  RTE-3 participants
and their impact will be reviewed in the RTE-3 organizers paper, which we
hope will reward contributors of useful resources.

The RTE Resource Pool is hosted as a sub-zone of NLPZone.org, a new
community portal. The resource pool has been seeded with a few resources,
however its usefulness relies on the community's (including your!)
contributions. Details on how to contribute to the RTE Resource Pool are
available at http://www.NLPZone.org. 

TENTATIVE SCHEDULE

Development Set Release:                Early December, 2006.
Test Set Release and Submissions:       Early March, 2007.
Workshop:                               Early Summer, 2007.
(We plan to propose having the RTE-3 workshop as an ACL 2007 workshop, to
be held late June in Prague).

ORGANIZING COMMITTEE

Danilo Giampiccolo, CELCT (Trento), Italy (coordinator)
Bernardo Magnini, ITC-irst (Trento), Italy (advisor)
Ido Dagan, Bar Ilan University, Israel (supervisor and scientific advisor)
Bill Dolan, Microsoft Research, USA
Patrick Pantel, ISI, USA (RTE Resources Pool)

CONTACT
Danilo Giampiccolo: info at celct.it, and put [RTE3] in the subject line.

SUPPORT
The preparation and running of this challenge has been supported by the
EU-funded PASCAL Network of Excellence on Pattern Analysis, Statistical
Modelling and Computational Learning. 

Microsoft Research and CELCT will provide assistance in the creation and
annotation of the data sets. 

Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-17-3066