Corpora: DEADLINE EXTENSION: ACL-2001 Workshop on Evaluation Methodologies for Language & Dialogue Systems

Priscilla Rasmussen rasmusse at cs.rutgers.edu
Wed Apr 11 15:40:40 UTC 2001


[ Extended submission deadline: **22 April**]

Call for Papers

Workshop on Evaluation Methodologies for Language and Dialogue Systems
ACL/EACL 2001
Toulouse, France
July 6-7, 2001

WORKSHOP GOALS

The aim of this two day workshop is to identify and to synthesize
current needs for language-technology evaluation.

The first day of the workshop will focus on one of the most challenging
current issues in language engineering: the evaluation of dialogue
systems and models. The second day will extend the discussion to address
the problem of evaluation in language engineering more broadly and on
more theoretical grounds.

The space of possible dialogues is enormous, even for limited domains
like travel information servers. The generalization of evaluation
methodologies across different application domains and languages is an
open problem. Review of published evaluations of dialogue models and
systems suggests that usability techniques are the standard method.
Dialogue-based system are often evaluated in terms of standard,
objective usability metrics, such as task-completion time and number of
user actions. In the past, researchers have proposed and debated
theory-based methods for modifying and testing the underlying dialogue
model, but the most widely used method of evaluation is usability
testing, although more precise and empirical methods for evaluating the
effectiveness of dialogue models have been proposed. For task-based
interaction, typical measures of effectiveness are time-to-completion
and task outcome, but the evaluation should focus on user satisfaction
rather than on arbitrary effectiveness measurements.Indeed, the problems
faced in current approaches to measurement of effectiveness dialogue
models and systems include:

Direct measures are unhelpful because efficient performance on the
nominal task may not represent the most effective interaction
Indirect measures usually rely on judgment and are vulnerable to weak
relationships between the inputs and outputs
Subjective measures are unreliable and domain-specific
For its first day, the workshop organizers solicit papers on these
issues, with particular emphasis on methods that go beyond usability
testing to address the underlying dialogue model. Representative
questions to be addressed include:

  o How do we deal with the combinatorial explosion of dialogue states?
  o How can satisfaction be measured with respect to underlying dialogue
models?
  o Are there useful direct measures of dialogue properties that do not
depend on task efficiency?
  o What is the role of agent-based simulation in evaluation of dialogue
models?

Of course, the problems faced in evaluating dialogue and system models
are found in other domains of language engineering, even for
non-interactive processes such as part-of-speech tagging, parsing,
semantic disambiguation, information extration, speech transcription,
and audio document indexing. So the issue of evaluation can be viewed at
a more generic level, raising fundamental, theoretical questions such
as:

  o What are the interest and benefits of evaluation for language
engineering?
  o Do we really need these specific methodologies, since a form of
evaluation sould always be present in any scientific investigation?
  o If evaluation is needed in language engineering, is it the case for
all domains?
  o What form should it take? Technology evaluation (task-oriented in
laboratory environment) or field/user Evaluation (complete systems in
real-life conditions)?

We have seen before that the the evaluation of dialogue models is still
unsolved, but for domains where metrics already exists, are they
satisfactory and sufficient? How can we take into account or abstract
from the subjective factor introduced by human operators in the process?
Do similarity measures and standards offer appropriate answers to this
problem? Most of the efforts focus on evaluating process, but what about
the issue of language resources evaluation?

For its second day of work, the workshop organizers solicit papers on
these issues, with the intent to address the problem of evaluation both
from a broader perspective (including novel applications domains for
evaluation, new metrics for known tasks and resource evaluation) and a
more theoretical point of view (including formal theory of evaluation
and infrastructural needs of language engineering).

NOTE: People who would like to submit a paper on lexical semantic
disambiguation evaluation should consider the parallel workshop, on July
5-6, for the closure of the SENSEVAL-2 evaluation campaign.

-------------------------------------------------------------

WORKSHOP ORGANIZATION

The organization of each of the two days of the workshop will reflect
the workshop's two main themes. Each day will begin with a session of
presentations of selected papers and follow with panel discussions to
synthesize and develop possible methodologies from additional selected
workshop papers.

WORKSHOP PARTICIPATION

The workshop seeks participation from people involved or interested in
the problem of evaluation in language processing and the research and
industrial communities that study and implement dialogue models for
natural-language interaction systems.

The first part of the workshop will specifically draw on the
natural-language interaction community, for instance like the one
developing at the confluence of SIGdial and SIGCHI, which will find in
this workshop an atmosphere more flavored by computational-linguistics
related issues (see, for example, the First SIGdialWorkshop on Discourse
and Dialogue).

The second part of the workshop is intended to provide a forum for a
broader audience more in the spirit of the one that attended the
LREC'2000 Satellite Workshop on Evaluation (see
http://www.limsi.fr/TLP/CLASS), in particular offering an opportunity to
people involved in language engineering evaluation (e.g ., the CLASS
audience) in the context of national or transnational projects or
programs, both in Europe and abroad.

-------------------------------------------------------------

SUBMISSION DETAILS

Paper submissions should follow the two-column format of ACL proceedings
and should not exceed eight (8) pages, including references. We strongly
recommend the use of ACL LaTeX style files or Microsoft Word Style files
tailored for this year's conference. They are available from the
ACL-2001 program committee Web site at http://acl2001.dfki.de/style/.

Papers should be submitted electronically, as either a LaTeX, Word or
PDF file to either:

Patrick Paroubek, pap at limsi.fr
Karen Ward, kward at cs.utep.edu

-------------------------------------------------------------

TIMETABLE OF IMPORTANT DATES

Deadline for workshop paper submissions: **April 22, 2001**
Deadline for notification of workshop paper acceptance: May 6, 2001
Deadline for camera-ready workshop papers:  May 16, 2001
Workshop date:  July 6-7, 2001


-------------------------------------------------------------

WORKSHOP ORGANIZING COMMITTEE

David G. Novick, UTEP
novick at cs.utep.edu
http://www.cs.utep.edu/novick

Joseph Mariani, Limsi - CNRS
mariani at limsi.fr
http://www.limsi.fr/Individu/mariani

Candy Kamm, AT&T Labs
cak at research.att.com
http://www.research.att.com/info/cak

Patrick Paroubek, Limsi - CNRS
pap at limsi.fr
http://www.limsi.fr/Individu/pap

Nils Dahlbäck, Linköping University
nilda at ida.liu.se
http://www.ida.liu.se/~nilda/

Frankie James, NASA Ames Research Center
fjames at riacs.edu
http://www-pcd.stanford.edu/frankie/

Karen Ward, UTEP, kward at cs.utep.edu
http://www.cs.utep.edu/kward


-------------------------------------------------------------

SCIENTIFIC COMMITTEE

David G. Novick
Joseph Mariani
Candy Kamm
Patrick Paroubek
Nils Dahlbäck
Frankie James
Karen Ward
Christian Jacquemin
Niels Ole Bernsen
Stephane Chaudiron
Khalid Choukri
Martin Rajman
Robert Gaizauskas
Donna Harman
Lynette Hirschman (tentative)
David Pallett (tentative)
Carol Peters (tentative)
Jose Pardo (tentative)
Herman Steeneken (tentative)
Oliviero Stock (tentative)
Saïd Tazi
Hans Uszkoreit (tentative)

-------------------------------------------------------------

SPONSORS

 ACL 2001
 CLASS
 ELRA
 ELSNET
 SIGdial

-------------------------------------------------------------

ADDITIONAL INFORMATION

Additional information on the workshop, including accepted papers and
the workshop schedule, will be made available as needed at
http://www.limsi.fr/TLP/CLASS/eacl01.html



More information about the Corpora mailing list