ACL-99 Workshop on discourse tagging - reminder

Morena Danieli danieli at CSELT.IT
Wed Mar 24 15:07:17 UTC 1999


************** Apologies for multiple copies ****************

TOWARDS STANDARDS AND TOOLS FOR DISCOURSE TAGGING
                        (ACL-99 Workshop )
                             June 22, 1999
                        University of Maryland
                       College Park, MD,  USA

               URL: http://www.mri.mq.edu.au/conf/acl99/

DESCRIPTION

Discourse tagging assigns labels from a tag set to discourse units in
texts or dialogues. The discourse units range from words and phrases,
such as referring expressions, to multi-utterance units identified by
criteria such as speaker intention or initiative. Just as the
availability of syntactically annotated corpora has resulted in major
advances in sentence-level natural language processing, we expect that
corpora tagged for discourse features will lead to similar advances in
discourse processing.

Work on discourse tagging has gained momentum in the last 3-4 years.
Three major initiatives in this area are: the Discourse Resource
Initiative (http://www.georgetown.edu/luperfoy/Discourse-Treebank/),
that has organized yearly international workshops addressing the
standardization of discourse tagging schemes for coreference, for
dialogue acts, and for higher level discourse structures; MATE
(http://mate.mip.ou.dk/), a project co-funded by the European Union,
whose aim is to develop tools and standards for tagging spoken
dialogue corpora at different levels, including the discourse level;
the Global Document Annotation initiative
(http://www.etl.go.jp/etl/nl/GDA),
that aims at having Internet authors annotate their documents with a
common standard tag set which allows machines to recognize the
semantic and pragmatic structures of documents.

Despite the progress made by these three initiatives, there is still
much work to be done before there are widely accepted (standardized)
discourse tagging schemes suitable for sharing and distribution across
sites and projects.  Moreover, there has not yet been an open forum to
which researchers working in this area could participate. This
workshop will provide such a forum.

Submissions are invited on, but not limited to, the following topics
and issues:

1. How can standardization for discourse tagging concretely be
achieved?  By developing a single coding scheme, or a set of coding
schemes, one for each phenomenon of interest? Or rather, by developing
some specification guidelines and mappings from one scheme to another?
In some other way?

2. Cross-level coding: All of the initiatives mentioned above promote
an approach in which coding schemes are developed at different levels,
rather than an approach in which a monolithic scheme addresses all
phenomena. Given this methodology, the issue of cross-level coding
arises, namely, how can coding schemes for different levels take
advantage of each other and allow coding of cross-level relationships?
Is it possible to use corpus annotations at different annotation
levels to examine the interdependence of linguistic phenomena?

3. Coding schemes and theories of discourse: Is it possible to develop
coding schemes that faithfully reflect a discourse theory? If yes,
is it desirable? Conversely, can corpora coded for discourse issues
help advance our theoretical understanding of discourse phenomena?

4. Coding schemes and applications: Is it possible to design discourse
coding schemes independently from the applications that the tagged
corpora may be used to inform (e.g., to train a speech act
classifier)?

5. Coding schemes and reliability: Thus far, experience in developing
schemes for discourse phenomena that can be coded reliably has been
mixed.  Whatever the reason (e.g., lack of an overarching theory for
discourse, genuine ambiguity and misunderstandings in real dialogue
reflected in the coding, etc), how can we devise reliable coding
schemes? What reliability measures should be used: are widely used
measures (Kappa, Alpha, precision and recall) and the corresponding
standards appropriate for discourse tagging?  If not, what other
measures can we use?  Is reliability affected by whether naive or
expert coders are used?

6.Tools for discourse tagging: What specific features of a tool does
discourse tagging require? Can we just extend tools developed for
other purposes, e.g. for syntactic tagging? Do we need to develop new
tools?

7. Some paradigms for evaluating dialogue systems take advantage of
the use of tagged corpora: How are discourse tagging and tagging for
evaluation purposes related? Are there some discourse tags that may be
used as evaluation tags or is it advisable to introduce another
dimension of tagging?

In addition to papers, prospective participants may be asked to do a
small coding exercise before the workshop, in order to test out
various tagging schemes.  Prospective participants who have developed
tools are welcome to bring a demo with them.


FORMAT FOR SUBMISSION

Authors are requested to submit an electronic version of their
papers. Send your electronic submission to both Marilyn Walker
(walker at research.att.com) and Morena Danieli (danieli at cselt.it).  If
electronic submission is impossible, please contact the organizers to
arrange for hardcopy submission (four hardcopies will be required).
Maximum length is 6 pages including figures and references.

Please conform with the traditional two-column ACL Proceedings
format.  Style files can be downloaded from
ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/


IMPORTANT DATES

Paper submission deadline:      March 26
Notification of acceptance:     April 16
Camera ready papers due:        April 30

ORGANIZING COMMITTEE

Marilyn Walker (Contact Person)
ATT Labs - Research
180 Park Ave
Rm. E-103
Florham Park, N.J. 07932, USA
walker at research.att.com
+1-973-360-8956

Morena Danieli (Contact Person)
CSELT-Centro Studi E Laboratori Telecomunicazioni
CF/VR
Via Reiss-Romoli, 274
I-10148 Torino, Italia
Morena.Danieli at cselt.it
+39-011-2286247

Johanna D. Moore
University of Edinburgh
Human Communication Research Centre
2, Buccleuch Place
Edinburgh EH8 9LW, UK
jmoore at cogsci.ed.ac.uk
+44-131-6511336

Barbara Di Eugenio
Department of Electrical Engineering and Computer Science
Science and Engineering Offices
851 South Morgan Street (M/C 154)
Chicago, Illinois 60607-7053, USA
bdieugen at eecs.uic.edu
+1-312-996-3422


PROGRAM COMMITTEE

Jean Carletta - HCRC, University of Edinburgh
Laila Dybkjaer - MIP, Odense University
Julia Hirschberg  - AT&T
Diane Litman - AT&T
Masato Ishizaki - JAIST
David Novick - EURISCO
Silvia Quazza - CSELT
Daniel Jurafsky - University of Colorado



More information about the LFG mailing list