[Corpora-List] Communicator corpora parsed?
David Reitter
david.reitter at gmail.com
Fri Jul 15 13:43:29 UTC 2005
I received two replies to my earlier question regarding the
availability of syntactic annotations of the DARPA Communicator
corpus and of other spoken dialogue corpora.
Both Sandra Kübler at Tübingen and Detmar Meurers at Ohio State
recommended the Verbmobil treebanks, which contain spoken dialogue in
German, English and Japanese. They are available via
http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html
A newer version of the German treebank is in preparation.
As a side note: many (if not most) of the non-canned, spontaneous
speech in Communicator consists of very short utterances. In
contrast, the Maptask corpus (developed here at HCRC, Edinburgh;
spoken human-human dialogue) has a lot to offer in terms of syntax
Thanks for the replies.
> is anyone aware of syntactic annotations of the (e.g. DARPA)
> Communicator corpus, or similar large, task-oriented human/machine
> or human/human dialogue corpora?
> I'm looking for tree structures, and atomic categories such as VP
> or PP would do just fine. I could work with non-perfect (i.e.
> machine- parsed) annotations.
> Generally I'd be grateful for tips regarding larger spoken
> dialogue corpora (task-oriented dialogue) that have been
> syntactically annotated.
>
--
David Reitter - ICCS/HCRC, Informatics, University of Edinburgh
More information about the Corpora
mailing list