[Corpora-List] Summary of responses: Pragmatic annotations

Wed Sep 14 08:36:27 UTC 2005

Hello,

Two weeks ago I posted a question about pragmatic annotations. Thanks to all 
of those who responded. Here's a brief summary.

'Further levels of annotation' by Geoffrey Leech, Tony McEnery and
Martin Wynne, in Corpus Annotation, edited by Roger Garside, Geoffrey
Leech and Anthony McEnery, Longman, Harlow, 2005.

-----------------------------------------------
ACL workshop on discourse annotation ?

http://www.cllt.osu.edu/dbyron/acl04/

------------------------------------------------
Some exploratory experiments regarding
general-knowledge-based cohesion in texts:

Beigman Klebanov, B., 2005.
"Using Readers to Identify Lexical Cohesive Structures in Texts"
In Proceedings of ACL-2005 Student Session, Ann Arbor, USA, June 2005,
pp. 55-60.

The annotation guidelines we've given to the subjects can be found on my
webpage: http://www.cs.huji.ac.il/~beata

---------------------------------------------------
The work of Samuels et al. in COLING Montreal (1998?). it has gone quite a 
way since then with lots of people joining it--below are a few references to 
work at Sheffield which gets good results from rather simpler classifier 
training than is usual:

Webb, N., M. Hepple and Y. Wilks (2005)
Error Analysis of Dialogue Act Classification, in Proceedings of the 8th 
International Conference on Text, Speech and Dialogue, Carlsbad, Czech 
Republic.

Webb, N., M. Hepple and Y. Wilks (2005)
Empirical determination of thresholds for optimal dialogue act 
classification, in Proceedings of the Ninth Workshop on the Semantics and 
Pragmatics of Dialogue (SemDial), Nancy.

Webb, N., M. Hepple and Y. Wilks (2005)
Dialogue Act Classification using Intra-Utterance Features, in Proceedings 
of the AAAI Workshop on Spoken Language Understanding, Pittsburgh.

Webb, N., H. Hardy, C. Ursu, M. Wu, T. Strzalkowski and Y. Wilks (2005)
Data-Driven Language Understanding for Spoken Language Dialogue, in 
Proceedings of the AAAI Workshop on Spoken Language Understanding, 
Pittsburgh, 2005.
-----------------------------------------------------------
I don't know if you count coreference as pragmatics, but you could look at 
Aone and Bennett's (1994) Discourse Tagging Tool; Alembic workbench; and 
Clinka. There was also a workshop at ACL on frontiers in annotation 
http://nlp.cs.nyu.edu/meyers/frontiers/2005.html which might have some 
useful pointers.

------------------------------------------------------------ 
Popescu-Belis et al 2003, A Thematic Bibliography on Dialogue Processing. 
Section 3.4 on  Dialogue data and annotation.
http://www.issco.unige.ch/projects/im2/mdm/docs/biblio/mdm-biblio.html

Dhillon et al, 2004, Meeting Recorder Project: Dialogue Act Labeling Guide
http://www.icsi.berkeley.edu/ftp/global/pub/speech/papers/MRDA-manual.pdf

Stolcke et al 2000 Doalogue Act Modeling for Automatic Tagging and 
Recognition of Conversationl Speech.Computational Linguistics 26(3), 
339-373.

Jurafsky et al, 1997, Switchboard SWBD-DAMSL Shallow-Discourse-Function 
Annotation
http://www.colorado.edu/ling/jurafsky/manual.august1.html

Carletta et al 1996, HCRC Dialogue Structure Coding Manual
http://www.hcrc.ed.ac.uk/publications/tr-82.ps.gz

---------------------------------------------------------------

May I call your attention to work we at Tilburg university have done on
classifying  dialogue acts  in spoken dialogues.
We applied machine learning to a Dutch corpus of human-machine dialogues
conducted with a spoken dialogue system.
We used a small, domain-specific  tagset that covered different aspects
of  pragmatic and semantic phenomena.

You may find  our related publications on
http://ilk.kub.nl/~piroska/research.htm , such as:

# P. Lendvai, A. van den Bosch: /Robust ASR lattice representation types
in pragma-semantic processing of spoken input./ In: Proc. of the AAAI
Spoken Language Understanding Workshop, SLU-2005, Pittsburgh, PA, 2005,
pages 15-22.

# P. Lendvai:/ Extracting Information from Spoken User Input. A Machine
Learning Approach./ Ph.D. thesis, Tilburg University, Netherlands, 2004.

# P. Lendvai, A. van den Bosch, E. Krahmer: /Machine Learning for Shallow
Interpretation of User Utterances in Spoken Dialogue Systems. /In: Proc.
of EACL-03 Workshop on Dialogue Systems:interaction, adaptation and
styles of management. Budapest, Hungary, 2003. pages 69-78.

# P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts: /Multi-feature
error detection. /In: Theune, M., Nijholt, A.& Hondorp, H. (Eds.),
Language and Computers: Studies in Practical Linguistics. (pp. 163-178).
Amsterdam: Rodopi. 2002.

# P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts:
/Improving machine-learned detection of miscommunications in
human-machine dialogues through informed data splitting. /In: Kuebler,
S. & Hinrichs, E. (Eds.), Machine Learning Approaches in Computational
Linguistics. (pp. 1-15). Trento, Italy: ESSLLI. 2002.

-----------------------------------------------------------------------------

There's an article by Lampert and Ervin-Tripp in _Talking Data: 
Transcription
and Coding for Discourse Research__, 1993 (edited by Martin Lampert and I)
which describes principles for designing, implementing and evaluating a
system of codes (including intercoder reliability).  Illustrated by
examples of coding of control acts in children.

For an array of different types of coding,
I'd recommend the deliverables from the MATE project, which are
available online.