<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<div class="moz-text-html" lang="x-western">

<div align="center"><b> 

Free Talkbank Corpora Still Available!</b><br>

<br>

LDC2005T33<br>

<b><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T33">BBN

Pronoun Coreference and Entity Type Corpus</a></b><br>

</div>

<p align="center">LDC2005T23<br>

<b><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T23">Chinese

Proposition Bank 1.0</a></b><br>

</p>

<p align="center">LDC2005S25<br>

<b><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005S25">Santa

Barbara Corpus of Spoken American English Part-IV</a></b><br>

<br>

<br>

</p>

<p align="center">The Linguistic Data

Consortium would like to

announce the availability of

free Talkbank data and of three new corpora.<br>

<br>

</p>

<hr size="2" width="100%">

<p><a href="http://www.talkbank.org/">TalkBank</a> is an indisciplinary

research project funded by a five year NSF grant to foster research and

development in communicative behavior by providing tools and standards

for analysis and distribution of language data.  The LDC distributes

the following Talkbank corpora:<br>

<br>

  LDC2003V01  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003V01">FORM2

Kinematic Gesture</a>  -  gesture annotation scheme designed to capture

the kinematic information in gesture from videos of speakers<br>

 <br>

  LDC2003L01  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003L01">Grassfields

Bantu Fieldwork: Dschang Lexicon</a>  - spoken lexicon with 5000+ sound

files<br>

 <br>

  LDC2003S02  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003S02">Grassfields

Bantu Fieldwork: Dschang Tone Paradigms</a>  - tone paradigms along

with phonetic and tonological transcriptions<br>

 <br>

  LDC2001S16  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001S16">Grassfields

Bantu Fieldwork: Ngomba Tone Paradigms</a>  - tone paradigms along with

phonetic and tonological transcriptions<br>

 <br>

  LDC2004L01  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L01">Klex:

Finite-State Lexical Transducer for Korean</a>  - for morphological

analysis and generation applications<br>

 <br>

  LDC2004T03  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T03">Morphologically

Annotated Korean Text</a>  - annotated morphological analysis and

part-of-speech tags<br>

 <br>

  LDC2003T15  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T15">SLX

Corpus of Classic Sociolinguistic Interviews</a> - 8 interviews

conducted by William Labov, plus transcripts, variable survey and

annotation tools<br>

 <br>

  LDC2003S06  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003S06">Santa

Barbara Corpus of Spoken American English Part-II</a>  - recordings of

natural speech from all over U.S.<br>

<br>

  LDC2004S10  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S10">Santa

Barbara Corpus of Spoken American English III</a> - recordings of

natural speech from all over U.S.<br>

<br>

  LDC2005S25  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005S25">Santa

Barbara Corpus of Spoken American English Part-IV</a> - over 5 hours of

recordings of natural speech from all over U.S.<br>

 <br>

  LDC2004S12  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S12">Talkbank

Ethology Data: Field Recordings of Vervet Monkey Calls</a> - 60

recordings with corresponding annotations<br>

<br>

Grant-sponsored copies for all of the above corpora are still

available.  Shipping and handling charges apply.  Please contact the

LDC

to learn if your organizaiton is

eligle to receive a free copy.<br>

</p>

<p align="center">*</p>

<p><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T33">BBN

Pronoun Coreference and Entity Type Corpus</a> supplements the 1

million word Penn Treebank corpus of Wall Street Journal texts

(LDC95T7). The corpus contains stand-off annotation of pronoun

coreference, indicated by sentence and token numbers, as well as

annotation of a variety of entity and numeric types. All annotation was

done by hand at BBN using proprietary annotation tools. This corpus was

developed by BBN to support the ACE and AQUAINT programs </p>

<p>The corpus contains two components: </p>

<ul>

  <li>

    <p>Pronoun coreference. Stand-off annotation of pronoun coreference

of the WSJ corpus is provided in a single file. Pronouns and

antecedents are indexed by sentence and token numbers.</p>

  </li>

  <li>

    <p>Entity types. The corpus includes annotation of 12 named entity

types (Person, Facility, Organization, GPE, Location, Nationality,

Product, Event, Work of Art, Law, Language, and Contact-Info), nine

nominal entity types (Person, Facility, Organization, GPE, Product,

Plant, Animal, Substance, Disease and Game), and seven numeric types

(Date, Time, Percent, Money, Quantity, Ordinal and Cardinal). Several

of these types are further divided into subtypes. Annotation for a

total of 64 subtypes is provided.</p>

  </li>

</ul>

<br>

<div align="center">*<br>

</div>

<br>

<a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T23">Chinese

Proposition Bank 1.0</a> is the first public release of the Penn

Chinese Proposition Bank project, which aims to create a corpus of text

annotated with information about basic semantic propositions.

Specifically, predicate-argument relations have been added to the

syntactic trees of Chinese Treebank 5.1 as an additional layer of

annotation.

<p>Chinese Proposition Bank 1.0 includes annotations of the first 250K

words of the Chinese TreeBank 5.1.  There are a total of 37,183

propositions. Auxiliary verbs are not annotated. Some verbs have light

verb and non-light verbs uses and in these cases only the non-light

verbs are annotated. All the annotations in this release are the result

of double blind annotation followed by adjudication of differences. <br>

</p>

<p><font face="Times New Roman"><small><big><br>

</big></small></font></p>

<p align="center"><font face="Times New Roman"><small><big>*<br>

</big></small></font></p>

<p><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005S25">Santa

Barbara Corpus of Spoken American English Part-IV</a> is based on

hundreds of recordings of natural speech from all over the United

States, representing a wide variety of people of different regional

origins, ages, occupations, and ethnic and social backgrounds. It

reflects many ways that people use language in their lives:

conversation, gossip, arguments, on-the-job talk, card games, city

council meetings, sales pitches, classroom lectures, political

speeches, bedtime stories, sermons, weddings, and more.  The corpus was

collected by theUniversity of California, Santa Barbara Center for the

Study of Discourse.<br>

</p>

The audio data consists of 14 wave format speech files, recorded in

two-channel pcm, at 22050Hz. The speech files total 5.75 hours of audio

(1.5 GB), representing over 58000 words and over 6000 unique words in

the transcribed text.  <br>

<br>

The cost of the first 100 copies of this publication (not counting the

copies distributed to LDC members) is covered by NSF Grant Number

BCS-998009, and therefore free of charge to qualified researchers; a

$30 shipping and handling fee applies. After these first 100 copies are

distributed, additional copies will be available for the production

cost of $200 per DVD-ROM.<br>

<font face="Times New Roman"><small><big><br>

</big></small></font>

<hr size="2" width="100%"><font face="Times New Roman"><small><big><br>

</big></small></font>

<div align="center"><big><font face="Times New Roman"><small>If you

need further

information, or would like to inquire about

membership to the LDC, please email <a class="moz-txt-link-abbreviated"

 href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215

573 2175.<br>

<br>

</small></font></big></div>

<div align="center">--------------------------------------------------------------------<br>

</div>

<div align="center">

<pre class="moz-signature" cols="72">Linguistic Data Consortium                     Phone: (215) 573-1275

3600 Market Street                             Fax:   (215) 573-2175

Suite 810                                          <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104                      <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

</div>

</div>

</body>

</html>