[Corpora-List] Communicator corpora parsed?

David Reitter david.reitter at gmail.com
Fri Jul 15 13:43:29 UTC 2005


I received two replies to my earlier question regarding the  
availability of syntactic annotations of the DARPA Communicator  
corpus and of other spoken dialogue corpora.
Both Sandra Kübler at Tübingen and Detmar Meurers at Ohio State  
recommended the Verbmobil treebanks, which contain spoken dialogue in  
German, English and Japanese. They are available via

http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html

A newer version of the German treebank is in preparation.

As a side note: many (if not  most) of the non-canned, spontaneous  
speech in Communicator consists of very short utterances. In  
contrast, the Maptask corpus (developed here at HCRC, Edinburgh;  
spoken human-human dialogue) has a lot to offer in terms of syntax

Thanks for the replies.



> is anyone aware of syntactic annotations of the (e.g. DARPA)   
> Communicator corpus, or similar large, task-oriented human/machine  
> or  human/human dialogue corpora?
> I'm looking for tree structures, and atomic categories such as VP  
> or  PP would do just fine. I could work with non-perfect (i.e.  
> machine- parsed) annotations.
> Generally I'd be grateful for tips regarding larger spoken  
> dialogue  corpora (task-oriented dialogue) that have been  
> syntactically annotated.
>

--
David Reitter - ICCS/HCRC, Informatics, University of Edinburgh



More information about the Corpora mailing list