[Corpora-List] RTE 3 Preliminary Announcement

Wed Oct 18 13:13:13 UTC 2006

Apologies for cross-postings

3RD PASCAL TEXTUAL ENTAILMENT CHALLENGE AND RESOURCES POOL
PRELIMINARY ANNOUNCEMENT
(http://www.pascal-network.org/Challenges/RTE3/ <http://www.pascal-network.org/Challenges/RTE3/> )

INTRODUCTION 

Encouraged by the success of the two previous rounds of the Recognizing Textual Entailment (RTE) challenge (more details to be found at the RTE websites, http://www.pascal-network.org/Challenges/RTE/ <http://www.pascal-network.org/Challenges/RTE/> ; http://www.pascal-network.org/Challenges/RTE2/ <http://www.pascal-network.org/Challenges/RTE2/> ), the RTE organizing committee would like to announce the 3rd round of the PASCAL Recognizing Textual Entailment (RTE) Challenge. 

RTE has been proposed as a generic empirical framework for evaluating semantic inference in an application independent manner. The goal of the first RTE has proven to be of great interest and the community's response encouraged us to gradually extend its scope. In the 2nd RTE Challenge 23 participating groups presented their work at the PASCAL Challenges Workshop in April 2006 in Venice. The event was successful and the number of participants and their contributions demonstrated that Textual Entailment is a quickly growing field of NLP research. Already, the workshops have spawned a large number of publications in major conferences, with more work in progress (see RTE-2 website for a comprehensive reference list). 

RTE 3 HIGHLIGHTS: WHAT IS NEW IN THE NEXT CHALLENGE

RTE 3 will follow the same structure of the previous campaign, to facilitate the participation of newcomers and to allow assessing improvements of earlier systems. Nevertheless, the following innovations will be introduced to extend the challenge:

+ A limited number of longer texts - i.e. one or two paragraphs long - will be introduced as a first step towards addressing broader settings which require discourse analysis. 

+ An RTE Resource Pool has been created as a shared central repository and evaluation forum for resource contributors and users (see details below). 

+ A tentative proposed dataset based on the results from the Answer Validation Exercise in the QA track at CLEF 2006, as a new pilot task. 

TASKS AND DATA DESCRIPTION

RTE is the task of recognizing that the meaning of one text is entailed (can be inferred) by another. The input to the challenge task consists of pairs of text units, termed T(ext) - the entailing text, and H(ypothesis) - the entailed text. The task consists in recognizing a directional relation between the two text fragments, deciding whether T entails H. More specifically, we say that T entails H if, typically, a human reading T would infer that H is most likely true. System results will be compared to a human-annotated gold-standard test corpus. 

The following H/T pairs exemplify the task proposed in the challenge:

T: Dr. George Carlo, an epidemiologist, asserts that medical science indicates increased risks of tumors, cancer, genetic damage and other health problems from the use of cell phones.
H: Cell phones pose health risks.
(TRUE)

T: The available scientific reports do not show that any health problems are associated with the use of wireless phones.
H: Cell phones pose health risks
(FALSE)

T: Exposure therapy is the main therapy used for treating agoraphobia. As agoraphobic problems tend to be more widespread, treatment can take longer - from three to six months.
H: Agoraphobia is a widespread disorder. 
(TRUE)

T: With agoraphobia there is widespread avoidance and restriction of activities and places.
H: Agoraphobia is a widespread disorder.
(FALSE)

The test and development data sets will be based on multiple data sources and are intended to be representative of typical problems encountered by applied systems. Examples will be a mixture of pairs that could/could not  be successfully handled by existing systems. As in RTE-2, data types corresponding to the following application areas will be used (see the RTE-3 website for more detail):

MAIN TASK

Question Answering (QA):
Simulating a QA scenario in which the hypothesized answer has to be inferred from the candidate text passage.

"Propositional" Information Retrieval (IR):
Propositional queries (e.g. "Women are poorly represented in Parliament") from IR evaluation datasets are chosen as hypotheses, and (correct and incorrect) sentences retrieved by IR systems are proposed as texts.

Information Extraction/Relation Extraction (IE):
Existing systems will be trained on several IE-style relations, and positive and negative examples from the system's output will be picked to generate T-H pairs.

Summarization (SUM):
Using the output of multi-document text summarization systems, sentence pairs that have high content overlap are converted into T-H pairs. We also plan to exploit the Pyramid method introduced as an evaluation methodology in the DUC 2005 competition.

ANSWER VALIDATION PILOT TASK

We are tentatively planning to introduce a pilot task where T/H pairs are taken from system results from the Answer Validation Exercise in the QA track at CLEF 2006. This data is contributed by UNED (Universidad Nacional de Educación a Distancia).

This year, the aim will be to include a limited proportion of longer texts -one or two paragraphs long- moving toward more comprehensive scenarios which require discourse analysis. 

THE RTE RESOURCE POOL AT NLPZONE.ORG

One of the key conclusions at the 2nd RTE Challenge Workshop was that entailment modeling requires vast knowledge resources that correspond to different types of entailment reasoning. Examples of useful knowledge include ontological and lexical relationships, paraphrases and entailment rules, meaning entailing syntactic transformations and certain types of world knowledge. Textual entailment systems also utilize general NLP tools such as POS taggers, parsers and named-entity recognizers, sometimes posing specialized requirements to such tools. With so many resources being continuously released and improved, it can be difficult to know which particular resource to use. 

In response, RTE-3 will include a new activity for building an RTE Resource Pool, which will serve as a portal and forum for publicizing and tracking resources, and reporting on their use. We actively solicit both RTE participants and other members of the NLP community who develop or use relevant resources to contribute to the RTE Resource Pool. Contributions include links and descriptions of relevant resources as well as informational postings regarding resource use and accumulated experience. Utilized resources will be cited and evaluated by the  RTE-3 participants and their impact will be reviewed in the RTE-3 organizers paper, which we hope will reward contributors of useful resources.

The RTE Resource Pool is hosted as a sub-zone of NLPZone.org, a new community portal. The resource pool has been seeded with a few resources, however its usefulness relies on the community's (including your!) contributions. Details on how to contribute to the RTE Resource Pool are available at http://www.NLPZone.org <http://www.NLPZone.org> . 

TENTATIVE SCHEDULE

Development Set Release:                Early December, 2006.
Test Set Release and Submissions:       Early March, 2007.
Workshop:                               Early Summer, 2007.
(We plan to propose having the RTE-3 workshop as an ACL 2007 workshop, to be held late June in Prague).

ORGANIZING COMMITTEE

Danilo Giampiccolo, CELCT (Trento), Italy (coordinator)
Bernardo Magnini, ITC-irst (Trento), Italy (advisor)
Ido Dagan, Bar Ilan University, Israel (supervisor and scientific advisor)
Bill Dolan, Microsoft Research, USA
Patrick Pantel, ISI, USA (RTE Resources Pool)

CONTACT
Danilo Giampiccolo: info at celct.it <mailto:info at celct.it> , and put [RTE3] in the subject line.

SUPPORT
The preparation and running of this challenge has been supported by the EU-funded PASCAL Network of Excellence on Pattern Analysis, Statistical Modeling and Computational Learning. 

Microsoft Research and CELCT will provide assistance in the creation and annotation of the data sets.