<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<div class="moz-text-html" lang="x-western">
<p align="center">LDC2006S35<b><br>
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S35">CSLU:
Multilanguage Telephone Speech Version 1.2</a><br>
</b></p>
<p align="center">LDC2006S31<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S31">NIST
2003 Language Recognition Evaluation</a><br>
</b></p>
<p align="center">LDC2006T12<br>
<b><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T12">Spanish
Gigaword First Edition</a><br>
<br>
</b></p>
<p align="center">The Linguistic Data
Consortium (LDC)
would
like to announce the availability of three new publications.<br>
<br>
</p>
<hr size="2" width="100%">
<p><br>
(1) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S35">CSLU:
Multilanguage Telephone Speech Version 1.2</a> corpus consists of
telephone speech from eleven languages: English, Farsi, French, German,
Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. The
corpus contains fixed vocabulary utterances (eg. days of the week) as
well as fluent continuous speech. The current release includes recorded
utterances from about 2052 speakers, for a total of about 38.5 hours of
speech. Time-aligned phonetic transcriptions for 619 of the utterances
are also included. For the data collection, the sampling rate was 8khz
and the files were stored in 16bit linear format on a UNIX file system.
Each utterance was recorded as a separate file. <br>
</p>
<p align="center">*<br>
</p>
<p>(2) The goal of the NIST Language Recognition Evaluation (LRE) is to
establish the baseline of current performance capability for language
recognition of conversational telephone speech and to lay the
groundwork for further research efforts in the field. The series had
its first evaluation in 1996. The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S31">2003
NIST Language Recognition Evaluation</a> (LRE-03) was part of this
ongoing series of evaluations of
language recognition technology. The task evaluated was the detection
of a given target language. Given a test segment of speech, a target
language was assigned as a test hypothesis, and the task was to
determine whether this test hypothesis was true or false. </p>
<p align="left">Each speech file is one side of a "4 wire" telephone
conversation
represented as 8-bit, 8kHz mulaw data. There are 7990 speech files in
sphere(.sph) format for a total of around six hours of speech. The
speech data was compiled from the LDC's CALLFRIEND, CALLHOME, and
SWITCHBOARD-2 corpora.<br>
</p>
<p align="center">*<br>
</p>
<p>(3) The <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T12">Spanish
Gigaword First Edition</a> is a comprehensive archive of
newswire text data that has been acquired over several years by the
Linguistic Data Consortium; some of the data included has been released
previously in other LDC corpora.</p>
<p>The three distinct international sources of Spanish newswire in this
edition, and the time spans of collection covered for each, are as
follows:</p>
<ul>
<li>Agence France-Presse, Spanish Service, May 1994 - Dec
2005 </li>
<li>Associated Press Worldstream, Spanish, Nov 1993 - Dec
2005 </li>
<li>Xinhua News Agency, Spanish Service, Sep 2001 - Dec 2005 </li>
</ul>
<br>
<hr size="2" width="100%"><br>
<br>
<div align="center"><font face="Courier New"><small><big><font
face="Times New Roman">If
you need further
information, or would like to inquire about
membership to the LDC, please email <a class="moz-txt-link-abbreviated"
href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215
573 1275.</font></big></small></font><br>
</div>
<p><font face="Courier New"><small><br>
<br>
</small></font>
</p>
<div align="center">--------------------------------------------------------------------<br>
</div>
<div align="center">
<pre class="moz-signature" cols="72">Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
</div>
</body>
</html>