Dialogue Data: Dialogue Diversity Corpus Version 2.0
bill_mann at SIL.ORG
Fri Sep 19 14:44:35 UTC 2003
DIALOGUE DIVERSITY CORPUS: Version 2.0
(apologies if you receive multiple copies)
A new release of the Dialogue Diversity Corpus (DDC) is available for facilitating research on human dialogue.
The DDC gives direct access to a set of dialogue transcripts (13 sources, more than 12 hours of dialogue, all in English.). It also gives a set of links and methods for indirect access to hundreds of additional dialogues (principally in English.) Many sources provide speech data as well as transcripts. The emphasis is on free or inexpensive access.
Volume 2.0 presents access to hundreds of dialogues that were not represented in the original release in October 2002. It is more diverse in terms of situations and dynamic patterns. Access to oral history interviews, the Watergate tapes (by several paths), diverse regional varieties of English (both British and international), the just-emerging American National Corpus (ANC), the U. S. Supreme Court, and other originally non-linguistic sources are presented for the first time.
The dialogues in this corpus occurred in a very diverse collection of interactive situations. Thus it is a data resource for studies of the breadth of coverage of particular dialogue models, and for studies that compare dialogue from different situations.
For smaller projects such as pilot studies, computer program testing and even some term papers, the direct access portion can be sufficient. The indirect access methods yield enough dialogue data for some much larger studies.
The corpus is designed for data finding rather than for bulk processing. Taken as a whole, it is irregular and not homogeneous in any way. It is generally unsuitable for drawing any conclusions about dialogue taken as a single category.
William C. Mann
bill_mann at sil.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Funknet