<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<div align="center">LDC2007S05<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S05">CSLU:

Yes/No Version 1.2</a>  -</b><br>

<br>

LDC2007T24<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T24">GALE

Phase 1 Arabic Broadcast News Parallel Text - Part 1</a>  -</b><br>

<br>

LDC2007S09<br>

<b>-  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S09">Mandarin

Affective Speech</a>  -<br>

<br>

</b>

<hr size="2" width="100%"></div>

<b><br>

</b>

<div align="center"><b>New Publications<br>

<br>

</b></div>

<p>(1) <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S05">

CSLU:

Yes/No Version 1.2</a> is a collection of answers to yes/no

questions from various telephone speech corpora created by the Center

for Spoken Language Understanding, Oregon Health and Science University

(CSLU). The corpus contains approximately 20,000 examples of roughly

18,000 speakers saying "yes" or "no" in response to various questions. </p>

<p>Each speech file in the corpus has a corresponding orthographic

transcription following the CSLU Labeling Conventions. In cases where a

transcription did not already exist, the utterance was run through a

speech recognizer to automatically obtain the transcription. </p>

<p>The data were collected from both analog and digital phone lines.

The analog data were recorded using a Gradient Technologies

analog-to-digital conversion box. These files were recorded as 16-bit,

8 kHz and stored in a linear format.</p>

<br>

<div align="center">*<br>

<br>

</div>

<p>(2) <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T24">

GALE Phase 1 Arabic Broadcast News Parallel Text - Part </a>1 is

the first part of the three-part GALE Phase 1 Arabic Broadcast News

Parallel Text, which, along with other corpora, was used as training

data in year 1 (Phase 1) of the DARPA-funded GALE program. This corpus

contains transcripts and English translations of 17 hours of Arabic

broadcast news programming selected from a variety of sources.  A

manual selection procedure was used to choose data appropriate for the

GALE program, namely, news and conversation programs focusing on

current events. Stories on topics such as sports, entertainment news,

and stock market reports were excluded from the data set. <br>

</p>

The selected audio snippets were then carefully transcribed by LDC

annotators and professional transcription agencies following LDC's

Quick Rich Transcription specification. Manual sentence units/segments

(SU) annotation was also performed as part of the transcription task.

Three types of end of sentence SU are identified:

<ul>

  <li>statement SU</li>

  <li>question SU</li>

  <li>incomplete SU</li>

</ul>

<p>After transcription and SU annotation, the files were reformatted

into a human-readable translation format and were then assigned to

professional translators for careful translation. Translators followed

LDC's GALE translation guidelines, which describe the makeup of the

translation team, the source data format, the translation data format,

best practices for translating certain linguistic features (such as

names and speech disfluencies), and quality control procedures applied

to completed translations.</p>

All final data are in Tab Delimited Format (TDF). TDF is compatible

with other transcription formats, such as the Transcriber format and AG

format, and it is easy to process.  Each line of a TDF file corresponds

to a speech segment and contains 13 tab delimited fields.  The source

TDF file and its translation are the same except that the transcript in

the source TDF is replaced by its English translation.  <br>

<br>

<br>

<div align="center">*<br>

</div>

<p><br>

(3)  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007S09">Mandarin

Affective Speech</a> is a database of emotional speech

consisting of audio recordings and corresponding transcripts collected

in 2005 at the Advance Computing and System Laboratory, College of

Computer Science and Technology, Zhejiang University, Hangzhou,

People's Republic of China. This corpus was designed with two goals:

first, to serve as a tool for linguistic and prosodic feature

investigation of emotional expression in Mandarin Chinese; and second,

to provide a source of training and test data essential to support

research in speaker recognition with affective speech. The speech

database was recorded by eliciting speakers to express different

emotional states in response to stimuli. The speakers read scenarios

designed to elicit an emotional response.  The five emotional states

recorded are characterized as follows: </p>

<ul>

  <li>Neutral - Simple statements without any emotion. </li>

  <li>Anger - A strong feeling of displeasure or hostility. </li>

  <li>Elation - Be glad or happy because of praise. </li>

  <li>Panic - A sudden, overpowering terror, often affecting many

people at once. </li>

  <li>Sadness - Affected or characterized by sorrow or unhappiness </li>

</ul>

<p>Recordings from 68 speakers (23 females, 45 males) were used in this

corpus. Subjects were given a text to read that consisted of five

phrases, fifteen sentences and two paragraphs designed to generate the

emotional speech. The material included all the phonemes in Mandarin.

Each subject read the phrases, paragraphs, and sentences portraying the

five emotional states.  Altogether this database contains 25,636

utterances.  <br>

</p>

<br>

<hr size="2" width="100%"><br>

<div align="center"><small><font face="Courier New, Courier, monospace"><br>

Ilya

Ahtaridis<br>

Membership Coordinator</font></small><br>

--------------------------------------------------------------------

<font face="Courier New, Courier, monospace"><br>

</font></div>

<div align="center">

<pre class="moz-signature" cols="72"><b><small><font

 face="Courier New, Courier, monospace">

</font></small>Linguistic Data Consortium                     Phone: (215) 573-1275

University of Pennsylvania                       Fax: (215) 573-2175

3600 Market St., Suite 810                         <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104                      <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></b></pre>

</div>

<br>

</body>

</html>