<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.2730.1700" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face=Arial size=2><FONT size=2>

<P align=center>Announcement</P>

<P align=center>DIALOGUE DIVERSITY CORPUS: Version 2.0</P></FONT><U><FONT 

color=#0000ff size=2>

<P 

align=center><http://www-rcf.usc.edu/~billmann/diversity></P></U></FONT><FONT 

size=2>

<P align=center>(apologies if you receive multiple copies)</P>

<P>A new release of the Dialogue Diversity Corpus (DDC) is available for 

facilitating research on human dialogue. </P>

<P>The DDC gives <STRONG>direct </STRONG>access to a set of dialogue transcripts 

(13 sources, more than 12 hours of dialogue, all in English.). It also gives a 

set of links and methods for <STRONG>indirect access to hundreds of additional 

dialogues</STRONG> (principally in English.) Many sources provide speech data as 

well as transcripts. The emphasis is on free or inexpensive access. </P>

<P>Volume 2.0 presents access to hundreds of dialogues that were not represented 

in the original release in October 2002. It is more diverse in terms of 

situations and dynamic patterns. Access to oral history interviews, the 

Watergate tapes (by several paths), diverse regional varieties of English (both 

British and international), the just-emerging American National Corpus (ANC), 

the U. S. Supreme Court, and other originally non-linguistic sources are 

presented for the first time. </P>

<P>The dialogues in this corpus occurred in a very diverse collection of 

interactive situations. Thus it is a data resource for studies of the breadth of 

coverage of particular dialogue models, and for studies that compare dialogue 

from different situations. </P>

<P>For smaller projects such as pilot studies, computer program testing and even 

some term papers, the direct access portion can be  sufficient. The 

indirect access methods yield enough dialogue data for some much larger studies. 

</P>

<P>The corpus is designed for data finding rather than for bulk processing. 

Taken as a whole, it is irregular and not homogeneous in any way. It is 

generally unsuitable for drawing any conclusions about dialogue taken as a 

single category.</P>

<P>===============<BR>William C. Mann</P></FONT><U><FONT color=#0000ff size=2>

<P><A 

href="mailto:bill_mann@sil.org">bill_mann@sil.org</A></U></FONT></P></FONT></DIV></BODY></HTML>