<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2730.1700" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2><FONT size=2>
<P align=center>Announcement</P>
<P align=center>DIALOGUE DIVERSITY CORPUS: Version 2.0</P></FONT><U><FONT
color=#0000ff size=2>
<P
align=center><http://www-rcf.usc.edu/~billmann/diversity></P></U></FONT><FONT
size=2>
<P align=center>(apologies if you receive multiple copies)</P>
<P>A new release of the Dialogue Diversity Corpus (DDC) is available for
facilitating research on human dialogue. </P>
<P>The DDC gives <STRONG>direct </STRONG>access to a set of dialogue transcripts
(13 sources, more than 12 hours of dialogue, all in English.). It also gives a
set of links and methods for <STRONG>indirect access to hundreds of additional
dialogues</STRONG> (principally in English.) Many sources provide speech data as
well as transcripts. The emphasis is on free or inexpensive access. </P>
<P>Volume 2.0 presents access to hundreds of dialogues that were not represented
in the original release in October 2002. It is more diverse in terms of
situations and dynamic patterns. Access to oral history interviews, the
Watergate tapes (by several paths), diverse regional varieties of English (both
British and international), the just-emerging American National Corpus (ANC),
the U. S. Supreme Court, and other originally non-linguistic sources are
presented for the first time. </P>
<P>The dialogues in this corpus occurred in a very diverse collection of
interactive situations. Thus it is a data resource for studies of the breadth of
coverage of particular dialogue models, and for studies that compare dialogue
from different situations. </P>
<P>For smaller projects such as pilot studies, computer program testing and even
some term papers, the direct access portion can be sufficient. The
indirect access methods yield enough dialogue data for some much larger studies.
</P>
<P>The corpus is designed for data finding rather than for bulk processing.
Taken as a whole, it is irregular and not homogeneous in any way. It is
generally unsuitable for drawing any conclusions about dialogue taken as a
single category.</P>
<P>===============<BR>William C. Mann</P></FONT><U><FONT color=#0000ff size=2>
<P><A
href="mailto:bill_mann@sil.org">bill_mann@sil.org</A></U></FONT></P></FONT></DIV></BODY></HTML>