<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
<div class="moz-text-html" lang="x-western">
<div align="center"><font face="Times New Roman, Times, serif"><i><span
style="">New Publications:</span></i></font></div>
<p class="MsoNormal" style="margin-bottom: 12pt; line-height:
normal;" align="center"><font face="Times New Roman, Times,
serif"><span style="">LDC2011S01</span><b><span style=""><br>
</span></b><b><span style="">- </span> <a href="#sre">2005
NIST Speaker Recognition Evaluation Training Data</a></b><b><span
style=""> -<br>
</span></b></font><font face="Times New Roman, Times, serif"><b><span
style=""></span></b><b><span style=""><br>
</span></b><span style="">LDC2011V03</span><b><span style=""><br>
- </span> <a href="#vace">NIST/USF Evaluation Resources
for the VACE Program - Meeting Data Test Set Part 3</a></b> <b><span
style=""> -</span></b></font><font face="Times New Roman,
Times, serif"><b><span style=""></span></b></font></p>
<div align="center"> </div>
<p class="MsoNormal" style="margin-bottom: 12pt; line-height:
normal;" align="center"><font face="Times New Roman, Times,
serif"><span style=""></span></font></p>
<hr width="100%" size="2"><br>
<div align="center"><font face="Times New Roman, Times, serif"><span
style=""><b>New Publications</b></span><br>
</font> </div>
<div align="center"><font face="Times New Roman, Times, serif"><br>
<span style=""></span></font> </div>
<p class="MsoNormal"><font face="Times New Roman, Times, serif"><a
name="sre"></a><span style="">(1) </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S01"><span
style="">2005 NIST Speaker Recognition Evaluation Training
Data</span></a><span style=""> was developed at LDC and
NIST (National Insitute of Standards and Technology). It
consists of <span style=""> </span>392 hours of
conversational telephone speech in English, Arabic, Mandarin
Chinese, Russian and Spanish and associated English
transcripts used as training data in the NIST-sponsored </span><a
href="http://www.itl.nist.gov/iad/mig/tests/spk/2005/index.html"><span
style="">2005 Speaker Recognition Evaluation</span></a><span
style=""> (SRE). The ongoing series of SRE yearly
evaluations conducted by NIST are intended to be of interest
to researchers working on the general problem of text
independent speaker recognition. To that end the evaluations
are designed to be simple, to focus on core technology
issues, to be fully supported and to be accessible to those
wishing to participate. <br>
<br>
The task of the 2005 SRE evaluation was speaker detection,
that is, to determine whether a specified speaker is
speaking during a given segment of conversational speech.
The task was divided into 20 distinct and separate tests
involving one of five training conditions and one of four
test conditions. <br>
<br>
The speech data consists of conversational telephone speech
with "multi-channel" data collected simultaneously from a
number of auxiliary microphones. The files are organized
into two segments: 10 second two-channel excerpts
(continuous segments from single conversations that are
estimated to contain approximately 10 seconds of actual
speech in the channel of interest) and 5 minute two-channel
conversations.</span></font></p>
<p class="MsoNormal"><font face="Times New Roman, Times, serif"><span
style="">The speech files are stored as 8-bit u-law speech
signals in separate SPHERE files. In addition to the
standard header fields, the SPHERE header for each file
contains some auxiliary information that includes the
language of the conversation and whether the data was
recorded over a telephone line.<br>
<br>
English language word transcripts in .cmt format were
produced using an automatic speech recognition system (ASR)
and contain error rates in the range of 15-30%. <br>
<br>
</span><span style=""><br>
</span></font></p>
<div align="center"><font face="Times New Roman, Times, serif"><b><span
style=""> *</span></b><br>
<span style=""></span></font> </div>
<p class="MsoNormal"><a name="vace"></a><font face="Times New
Roman, Times, serif"><span style="">(2</span></font><font
face="Times New Roman, Times, serif"><span style="">) </span><a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V03"><span
style="">NIST/USF Evaluation Resources for the VACE
Program - Meeting Data Test Set Part 3</span></a><span
style="">, Linguistic Data Consortium (LDC) catalog number
LDC2011V03 and isbn 1-58563-579-0, was developed by
researchers at the </span><a href="http://www.cse.usf.edu/"><span
style="">Department of Computer Science and Engineering</span></a><span
style="">, University of South Florida (USF), Tampa, Florida
and the </span><a href="http://nist.gov/itl/iad/mig/"><span
style="font-family: "Times New
Roman","serif"; color: blue;">Multimodal
Information Group</span></a><span style="font-family:
"Times New Roman","serif";"> at the
National Institute of Standards and Technology (NIST). It
contains approximately eleven hours of meeting room video
data collected in 2001 and 2002 at NIST's Meeting Data
Collection Laboratory and annotated for the VACE (Video
Analysis and Content Extraction) 2005 face, person and hand
detection and tracking tasks.<br>
<br>
<span style="color: black;">The VACE program was established
to develop novel algorithms for automatic video content
extraction, multi-modal fusion, and event understanding.
During VACE Phases I and II, the program made significant
progress in the automated detection and tracking of moving
objects including faces, hands, people, vehicles and text
in four primary video domains: broadcast news, meetings,
street surveillance, and unmanned aerial vehicle motion
imagery. Initial results were also obtained on automatic
analysis of human activities and understanding of video
sequences. <br>
<br>
Three performance evaluations were conducted under the
auspices of the VACE program between 2004 and 2007. The
2005 evaluation was administered by USF in collaboration
with NIST and guided by an advisory forum including the
evaluation participants. A summary of results of the
evaluation can be found in the </span></span><a
href="https://secure.ldc.upenn.edu/intranet/docs/VACE2005_report.pdf"><span
style="">2005 VACE results and analysis paper</span></a><span
style=""> included in this release. </span><span style=""><br>
<br>
NIST's Meeting Data Collection Laboratory is designed to
collect corpora to support research, development and
evaluation in meeting recognition technologies. It is
equipped to look and sound like a conventional meeting
space. The data collection facility includes five Sony
EV1-D30 video cameras, four of which have stationary views
of a center conference table (one view from each surrounding
wall) with a fixed focus and viewing angle, and an
additional "floating" camera which is used to focus on
particular participants, whiteboard or conference table
depending on the meeting forum. The data is captured in a
NIST-internal file format. The video data was extracted from
the NIST format and encoded using the MPEG-2 standard in
NTSC format. Further information concerning the video data
parameters can found in the documentation included with this
corpus. <br>
</span></font><br>
</p>
<br>
<hr width="100%" size="2">
<div align="center"><br>
<pre class="moz-signature" cols="72">Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
</div>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>