[Corpora-List] BOUNCE corpora at lists.uib.no: Non-member submission from [David Reitter <dreitter at inf.ed.ac.uk>] (fwd)

Knut Hofland knut at aksis.uib.no
Thu Jul 14 14:23:31 UTC 2005


From: David Reitter <dreitter at inf.ed.ac.uk>
Subject: Re: [Corpora-List] Communicator corpora parsed?
Date: Thu, 14 Jul 2005 14:13:59 +0100
To: corpora at hd.uib.no
X-Mailer: Apple Mail (2.733)
X-Provags-ID: kundenserver.de abuse at kundenserver.de login:f3c9a04d49beab9fcce37ffcb55ebfb9
X-checked-clean: by exiscan on rolf
X-Scanner: dcaa7fd1c863bbb41df6d4b6c9b93a92 http://tjinfo.uib.no/virus.html
X-UiB-SpamFlag: NO UIB: -7 hits, 8.0 required
X-UiB-SpamReport: spamassassin found;
  -7.0 Asked for it

I received two replies to my earlier question regarding the =20
availability of syntactic annotations of the DARPA Communicator =20
corpus and of other spoken dialogue corpora.
Both Sandra K=FCbler at T=FCbingen and Detmar Meurers at Ohio State =20
recommended the Verbmobil treebanks, which contain spoken dialogue in =20=

German, English and Japanese. They are available via

http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html

A newer version of the German treebank is in preparation.

As a side note: many (if not  most) of the non-canned, spontaneous =20
speech in Communicator consists of very short utterances. In =20
contrast, the Maptask corpus (developed here at HCRC, Edinburgh; =20
spoken human-human dialogue) has a lot to offer in terms of syntax

Thanks for the replies.


> is anyone aware of syntactic annotations of the (e.g. DARPA)  =20
> Communicator corpus, or similar large, task-oriented human/machine =20
> or  human/human dialogue corpora?
> I'm looking for tree structures, and atomic categories such as VP =20
> or  PP would do just fine. I could work with non-perfect (i.e. =20
> machine- parsed) annotations.
> Generally I'd be grateful for tips regarding larger spoken =20
> dialogue  corpora (task-oriented dialogue) that have been =20
> syntactically annotated.



More information about the Corpora mailing list