<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<div class="moz-text-html" lang="x-western">

<p align="center">LDC2006S35<b><br>

<a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S35">CSLU:

Multilanguage Telephone Speech Version 1.2</a><br>

</b></p>

<p align="center">LDC2006S31<br>

<b><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S31">NIST

2003 Language Recognition Evaluation</a><br>

</b></p>

<p align="center">LDC2006T12<br>

<b><a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T12">Spanish

Gigaword First Edition</a><br>

<br>

</b></p>

<p align="center">The Linguistic Data

Consortium (LDC)

would

like to announce the availability of three new publications.<br>

<br>

</p>

<hr size="2" width="100%">

<p><br>

(1) The <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S35">CSLU: 

Multilanguage Telephone Speech Version 1.2</a> corpus consists of

telephone speech from eleven languages: English, Farsi, French, German,

Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. The

corpus contains fixed vocabulary utterances (eg. days of the week) as

well as fluent continuous speech. The current release includes recorded

utterances from about 2052 speakers, for a total of about 38.5 hours of

speech. Time-aligned phonetic transcriptions for 619 of the utterances

are also included.  For the data collection, the sampling rate was 8khz

and the files were stored in 16bit linear format on a UNIX file system.

Each utterance was recorded as a separate file.  <br>

</p>

<p align="center">*<br>

</p>

<p>(2) The goal of the NIST Language Recognition Evaluation (LRE) is to

establish the baseline of current performance capability for language

recognition of conversational telephone speech and to lay the

groundwork for further research efforts in the field. The series had

its first evaluation in 1996. The <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S31">2003

NIST Language Recognition Evaluation</a> (LRE-03) was part of this

ongoing series of evaluations of

language recognition technology.  The task evaluated was the detection

of a given target language. Given a test segment of speech, a target

language was assigned as a test hypothesis, and the task was to

determine whether this test hypothesis was true or false. </p>

<p align="left">Each speech file is one side of a "4 wire" telephone

conversation

represented as 8-bit, 8kHz mulaw data. There are 7990 speech files in

sphere(.sph) format for a total of around six hours of speech. The

speech data was compiled from the LDC's CALLFRIEND, CALLHOME, and

SWITCHBOARD-2 corpora.<br>

</p>

<p align="center">*<br>

</p>

<p>(3) The <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T12">Spanish

Gigaword First Edition</a> is a comprehensive archive of

newswire text data that has been acquired over several years by the

Linguistic Data Consortium; some of the data included has been released

previously in other LDC corpora.</p>

<p>The three distinct international sources of Spanish newswire in this

edition, and the time spans of collection covered for each, are as

follows:</p>

<ul>

  <li>Agence France-Presse, Spanish Service, May 1994 - Dec

2005 </li>

  <li>Associated Press Worldstream, Spanish, Nov 1993 - Dec

2005 </li>

  <li>Xinhua News Agency, Spanish Service, Sep 2001 - Dec 2005 </li>

</ul>

<br>

<hr size="2" width="100%"><br>

<br>

<div align="center"><font face="Courier New"><small><big><font

 face="Times New Roman">If

you need further

information, or would like to inquire about

membership to the LDC, please email <a class="moz-txt-link-abbreviated"

 href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215

573 1275.</font></big></small></font><br>

</div>

<p><font face="Courier New"><small><br>

<br>

</small></font>

</p>

<div align="center">--------------------------------------------------------------------<br>

</div>

<div align="center">

<pre class="moz-signature" cols="72">Linguistic Data Consortium                     Phone: (215) 573-1275

University of Pennsylvania                       Fax: (215) 573-2175

3600 Market St., Suite 810                         <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                  <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

</div>

</div>

</body>

</html>