[Corpora-List] SIGIR 2004 Workshop 2nd CFP: Information Retrieval for Question Answering (IR4QA)

Mark Greenwood M.Greenwood at dcs.shef.ac.uk
Sun May 23 11:08:45 UTC 2004


                       2nd Call for Papers

                        SIGIR'04 Workshop

           INFORMATION RETRIEVAL FOR QUESTION ANSWERING (IR4QA)

                   July 29, 2004, Sheffield, UK


Open domain question answering has become a very active research area
over the past few years, due in large measure to the stimulus of the
TREC Question Answering track. This track addresses the task of finding
*answers* to natural language (NL) questions (e.g. ``How tall is the
Eiffel Tower?" ``Who is Aaron Copland?'') from large text collections.
This task stands in contrast to the more conventional IR task of
retrieving *documents* relevant to a query, where the query may be
simply a collection of keywords (e.g. ``Eiffel Tower", ``American
composer, born Brooklyn NY 1900, ...'').

Finding answers requires processing texts at a level of detail that
cannot be carried out at retrieval time for very large text collections.
This limitation has led many researchers to propose, broadly, a two
stage approach to the QA task. In stage one a subset of query-relevant
texts are selected from the whole collection.  In stage two this subset
is subjected to detailed processing for answer extraction. To date stage
one has received limited explicit attention, despite its obvious
importance -- performance at stage two is bounded by performance at
stage one.  The goal of this workshop is to correct this situation, and,
hopefully, to draw attention of IR researchers to the specific
challenges raised by QA.

A straightforward approach to stage one is to employ a conventional IR
engine, using the NL question as the query and with the collection
indexed in the standard manner, to retrieve the initial set of candidate
answer bearing documents for stage two.  However, a number of
possibilities arise to optimise this set-up for QA, including: o
preprocessing the question in creating the IR query; o preprocessing the
collection to identify significant information that
  can be included in the indexation for retrieval;
o adapting the similarity metric used in selecting documents;
o modifying the form of retrieval return, e.g. to deliver passages
  rather than whole documents.

For this workshop, we solicit papers that address any aspect of how this
first, retrieval stage of QA can be adapted to improve overall system
performance. Possible topics include, but are not limited to: o
parametrizations/optimizations of specific IR systems for QA o studies
of query formation strategies suited to QA o different uses of IR for
factoid vs. non-factoid questions o utility of term matching
constraints, e.g. term proximity, for QA o analyses of passage retrieval
vs full document retrieval for QA o analyses of boolean vs ranked
retrieval for QA o impact of IR performance on overall QA performance o
named entity preprocessing of questions or collections o corpus
preprocessing to create corpus-specific thesauri for question
  expansion
o evaluation measures for assessing IR for QA

The workshop will include paper presentations and discussion.  All those
wishing to make a presentation should submit a 5-8 page position paper;
other attendees may submit a short abstract on why this topic is of
interest to them. The papers should describe recent work and may be
preliminary in nature.  The programme committee will arrange the
presentations and discussion based on the quality of submissions and
expressed interests of the attendees, and may invite other presentations
as well. See http://www.sigir.org/sigir2004 for further details.

Important Dates
===============

Position paper submission:    June 7
Acceptance notification:      June 23
Final papers due:             July 6
Workshop:                     July 29

Submission Instructions
=======================

Position papers should be no more than 4000 words (5-8 pages). The
standard ACM conference style is recommended (see:
http://www.acm.org/sigs/pubs/proceed/template.html). Submissions must be
sent electronically in PDF or PostScript format to:

Rob Gaizauskas
R.Gaizauskas at sheffield.ac.uk

Workshop Organizers
===================

Rob Gaizauskas          (University of Sheffield)
Mark Hepple             (University of Sheffield)
Mark Greenwood          (University of Sheffield)

Programme Committee
===================

Shannon Bradshaw        (University of Iowa)
Charles Clarke          (University of Waterloo)
Sanda Harabagiu         (University of Texas at Dallas)
Eduard Hovy             (University of Southern California)
Jimmy Lin               (Massachusetts Institute of Technology)
Christof Monz           (University of Maryland)
John Prager             (IBM)
Dragomir Radev          (University of Michigan)
Maarten de Rijke        (University of Amsterdam)
Horacio Saggion         (University of Sheffield)
Karen Sparck-Jones      (University of Cambridge)
Tomek Strzalkowski      (State University of New York, Albany)
Ellen Voorhees          (NIST)



More information about the Corpora mailing list