[Corpora-List] BOUNCE corpora at lists.uib.no: Non-member submission from [David Reitter <dreitter at inf.ed.ac.uk>] (fwd)
Knut Hofland
knut at aksis.uib.no
Thu Jul 14 14:23:31 UTC 2005
From: David Reitter <dreitter at inf.ed.ac.uk>
Subject: Re: [Corpora-List] Communicator corpora parsed?
Date: Thu, 14 Jul 2005 14:13:59 +0100
To: corpora at hd.uib.no
X-Mailer: Apple Mail (2.733)
X-Provags-ID: kundenserver.de abuse at kundenserver.de login:f3c9a04d49beab9fcce37ffcb55ebfb9
X-checked-clean: by exiscan on rolf
X-Scanner: dcaa7fd1c863bbb41df6d4b6c9b93a92 http://tjinfo.uib.no/virus.html
X-UiB-SpamFlag: NO UIB: -7 hits, 8.0 required
X-UiB-SpamReport: spamassassin found;
-7.0 Asked for it
I received two replies to my earlier question regarding the =20
availability of syntactic annotations of the DARPA Communicator =20
corpus and of other spoken dialogue corpora.
Both Sandra K=FCbler at T=FCbingen and Detmar Meurers at Ohio State =20
recommended the Verbmobil treebanks, which contain spoken dialogue in =20=
German, English and Japanese. They are available via
http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html
A newer version of the German treebank is in preparation.
As a side note: many (if not most) of the non-canned, spontaneous =20
speech in Communicator consists of very short utterances. In =20
contrast, the Maptask corpus (developed here at HCRC, Edinburgh; =20
spoken human-human dialogue) has a lot to offer in terms of syntax
Thanks for the replies.
> is anyone aware of syntactic annotations of the (e.g. DARPA) =20
> Communicator corpus, or similar large, task-oriented human/machine =20
> or human/human dialogue corpora?
> I'm looking for tree structures, and atomic categories such as VP =20
> or PP would do just fine. I could work with non-perfect (i.e. =20
> machine- parsed) annotations.
> Generally I'd be grateful for tips regarding larger spoken =20
> dialogue corpora (task-oriented dialogue) that have been =20
> syntactically annotated.
More information about the Corpora
mailing list