<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><span style="font-size: 12pt;">- </span><span
style="font-size: 12pt;"><b><a href="#olympiad"><span
style="color: blue;">LDC Sponsors a Student Group at 2011
International Linguistics Olympiad</span></a></b> -</span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><span style="font-size: 12pt;">- </span><span
style="font-size: 12pt;"><b><a href="#meta"><span style="color:
blue;">LDC Receives META Prize from META-NET</span></a></b>
-</span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><i><span style="font-size: 12pt;">New
publications:</span></i><span style="font-size: 12pt;"></span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><b><span style="font-size: 12pt;">- </span><span
style="font-size: 12pt;"><a href="#sre"><span style="color:
blue;">2005 NIST Speaker Recognition Evaluation Test Data</span></a>
-</span></b></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><b><span style="font-size: 12pt;">- </span><span
style="font-size: 12pt;"><a href="#std"><span style="color:
blue;">2006 NIST Spoken Term Detection Evaluation Set</span></a>
-</span></b></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><b><span style="font-size: 12pt;">- </span></b><span
style="font-size: 12pt;"><b><a href="#vace"><span style="color:
blue;">NIST/USF Evaluation Resources for the VACE Program
- Meeting Data Test Set Part 2</span></a></b><b> -</b></span><span
style="font-size: 12pt;"></span></p>
<div class="MsoNormal" style="margin-bottom: 0.0001pt; text-align:
center; line-height: normal;" align="center"><span
style="font-size: 12pt;">
<hr width="100%" align="center" size="2"> </span></div>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><span style="font-size: 12pt;"><br>
<b><a name="olympiad">LDC Sponsors a Student Group at 2011
International Linguistics Olympiad</a></b></span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">LDC is happy to support the 2011
International Linguistics Olympiad <span class="msoDel"><del
cite="mailto:dipersio" datetime="2011-07-13T18:30"> </del></span>by
sponsoring a student team. The IOL is one of the twelve <a
href="http://olympiads.win.tue.nl/"><span style="color: blue;">International
Science
Olympiads</span></a> and is an annual event that brings
together students from around the world to compete in
linguistically–based challenges. This year’s competition takes
place from July 24-30 at Carnegie Mellon University, Pittsburgh,
PA USA. Students do not need to have a background in
linguistics in order to participate since they typically use
analysis and deductive reasoning to solve the competition
problems. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">Please visit the 2011 <a
href="http://www.ioling.org/2011/"><span style="color: blue;">IOL
website</span></a> for additional details. We wish good luck
to all of the participants!</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"> </span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><b><a name="meta"><span style="font-size:
12pt;">LDC Receives META Prize from META-NET</span></a></b><span
style="font-size: 12pt;"></span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"> LDC was awarded a ‘2<sup>nd</sup> META
Prize’ from META-NET ‘for outstanding long term commitment to
the preparation and distribution of language resources and
technologies.’</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"> The META Prize is awarded by META-NET
to those who provide outstanding products or services that
support the European Multilingual Information Society. <a
href="http://www.meta-net.eu/mission"><span style="color:
blue;">META-NET</span></a> is a Network of Excellence
dedicated to fostering the technological foundations of a
multilingual European information society. Several organizations
were honored at this year’s META Forum in Budapest; LDC and <a
href="http://www.elra.info/"><span style="color: blue;">ELRA</span></a>
were both honored for supporting and developing language
resources.<br>
<br>
</span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><b><span style="font-size: 12pt;">New
Publications </span></b></p>
<p class="MsoNormal" style="line-height: normal;"><a name="sre"><span
style="font-size: 12pt;">(1)</span></a><span style="font-size:
12pt;"> <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S04"><span
style="color: blue;">2005 NIST Speaker Recognition
Evaluation Test Data</span></a> was developed at LDC and
NIST (National Institute of Standards and Technology). It
consists of 525 hours of conversational telephone speech in
English, Arabic, Mandarin Chinese, Russian and Spanish and
associated English transcripts used as test data in the
NIST-sponsored <a
href="http://www.itl.nist.gov/iad/mig/tests/spk/2005/index.html"><span
style="color: blue;">2005 Speaker Recognition Evaluation</span></a>
(SRE). The ongoing series of SRE yearly evaluations conducted by
NIST are intended to be of interest to researchers working on
the general problem of text independent speaker recognition. To
that end the evaluations are designed to be simple, to focus on
core technology issues, to be fully supported and accessible. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The task of the 2005 SRE evaluation was
speaker detection, that is, to determine whether a specified
speaker is speaking during a given segment of conversational
speech. The task was divided into 20 distinct and separate tests
involving one of five training conditions and one of four test
conditions. Further information about the task conditions is
contained in the <a
href="http://www.itl.nist.gov/iad/mig/tests/sre/2005/sre-05_evalplan-v6.pdf"><span
style="color: blue;">The NIST Year 2005 Speaker Recognition
Evaluation Plan</span></a>. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The speech data consists of
conversational telephone speech with "multi-channel" data
collected by LDC simultaneously from a number of auxiliary
microphones. The files are organized into two segments: 10
second two-channel excerpts (continuous segments from single
conversations that are estimated to contain approximately 10
seconds of actual speech in the channel of interest) and 5
minute two-channel conversations.</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The data are stored as 8-bit u-law
speech signals in NIST SPHERE format. In addition to the
standard header fields, the SPHERE header for each file contains
some auxiliary information that includes the language of the
conversation and whether the data was recorded over a telephone
line. English language word transcripts in .cmt format were
produced using an automatic speech recognition system (ASR) with
error rates in the range of 15-30%.</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"><br>
</span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><span style="font-size: 12pt;">*</span></p>
<p class="MsoNormal" style="line-height: normal;"><a name="std"><span
style="font-size: 12pt;">(2)</span></a><span style="font-size:
12pt;"> <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S03"><span
style="color: blue;">2006 NIST Spoken Term Detection
Evaluation Set</span></a> was compiled by researchers at
NIST (National Institute of Standards and Technology) and
contains approximately eighteen hours of Arabic, Chinese and
English broadcast news, English conversational telephone speech
and English meeting room speech used in NIST's <a
href="http://www.itl.nist.gov/iad/mig/tests/std/2006/index.html"><span
style="color: blue;">2006 Spoken Term Detection (STD)
evaluation</span></a>. The STD initiative is designed to
facilitate research and development of technology for retrieving
information from archives of speech data with the goals of
exploring promising new ideas in spoken term detection,
developing advanced technology incorporating these ideas,
measuring the performance of this technology and establishing a
community for the exchange of research results and technical
insights. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The 2006 STD task was to find all of
the occurrences of a specified "term" (a sequence of one or more
words) in a given corpus of speech data. The evaluation was
intended to develop technology for rapidly searching very large
quantities of audio data. Although the evaluation used modest
amounts of data, it was structured to simulate the very large
data situation and to make it possible to extrapolate the speed
measurements to much larger data sets. Therefore, systems were
implemented in two phases: indexing and searching. In the
indexing phase, the system processes the speech data without
knowledge of the terms. In the searching phase, the system uses
the terms, the index, and optionally the audio to detect term
occurrences. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The evaluation corpus consists of three
data genres: broadcast news (BNews), conversational telephone
speech (CTS) and conference room meetings (CONFMTG). The
broadcast news material was collected in 2003 and 2004 by <a
href="http://www.ldc.upenn.edu/DataSheets/Broadcast_Collection_System_DS.pdf"><span
style="color: blue;">LDC's broadcast collection system </span></a>from
the following sources: ABC (English), Aljazeera (Arabic), China
Central TV (Chinese), CNN (English), CNBC (English), Dubaie TV
(Arabic), New Tang Dynasty TV (Chinese), Public Radio
International (English) and Radio Free Asia(Chinese). The CTS
data was taken from the Switchboard data sets (e.g., <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S75"><span
style="color: blue;">Switchboard-2 Phase 1 LDC98S75</span></a>,
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99S79"><span
style="color: blue;">Switchboard-2 Phase 2 LDC99S79</span></a>)
and the Fisher corpora (e.g., <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S13"><span
style="color: blue;">Fisher English Training Speech Part 1
LDC2004S13</span></a>), also collected by LDC. The
conference room meeting material consists of goal-oriented,
small group round table meetings and was collected in 2004 and
2005 by NIST, the International Computer Science Institute
(Berkeley, California), Carnegie Mellon University (Pittsburgh,
PA), TNO (The Netherlands) and Virginia Polytechnic Institute
and State University (Blacksburg, VA) as part of the <a
href="http://corpus.amiproject.org/"><span style="color:
blue;">AMI corpus project</span></a>. This evaluation corpus
includes scoring software. It uses the inputs described in the
STD Evaluation plan to complete the evaluation of a system. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">Each BNews recording is a 1-channel,
pcm-encoded, 16Khz, SPHERE formatted file. CTS recordings are
2-channel, u-law encoded, 8 Khz, SPHERE formatted files. The
CONFMTG files contain a single recorded channel.</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"><br>
<br>
</span></p>
<p class="MsoNormal" style="text-align: center; line-height:
normal;" align="center"><span style="font-size: 12pt;">*</span></p>
<p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height:
normal;"><a name="vace"><span style="font-size: 12pt;">(3)</span></a><span
style="font-size: 12pt;"> <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V04"><span
style="color: blue;">NIST/USF Evaluation Resources for the
VACE Program - Meeting Data Test Set Part 2</span></a> was
developed by researchers at the <a
href="http://www.cse.usf.edu/"><span style="color: blue;">Department
of Computer Science and Engineering</span></a>, University
of South Florida (USF), Tampa, Florida and the <a
href="http://nist.gov/itl/iad/mig/"><span style="color: blue;">Multimodal
Information Group</span></a> at the National Institute of
Standards and Technology (NIST). It contains approximately
thirteen hours of meeting room video data collected in 2001 and
2002 at NIST's Meeting Data Collection Laboratory and used in
the VACE (Video Analysis and Content Extraction) 2005
evaluation. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">The VACE program was established to
develop novel algorithms for automatic video content extraction,
multi-modal fusion, and event understanding. During VACE Phases
I and II, the program made significant progress in the automated
detection and tracking of moving objects including faces, hands,
people, vehicles and text in four primary video domains:
broadcast news, meetings, street surveillance, and unmanned
aerial vehicle motion imagery. Initial results were also
obtained on automatic analysis of human activities and
understanding of video sequences. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">Three performance evaluations were
conducted under the auspices of the VACE program between 2004
and 2007. The 2005 evaluation was administered by USF in
collaboration with NIST and guided by an advisory forum
including the evaluation participants. </span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">LDC has previously released <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V01"><span
style="color: blue;">NIST/USF Evaluation Resources for the
VACE Program -- Meeting Data Training Set Part 1 LDC2011V01</span></a>,
<a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V02"><span
style="color: blue;">NIST/USF Evaluation Resources for the
VACE Program -- Meeting Data Training Set Part 2 LDC2011V02</span></a>
and <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V03"><span
style="color: blue;">NIST/USF Evaluation Resources for the
VACE Program -- Meeting Data Test Set Part 1 LDC2011V03</span></a>.</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;">NIST's Meeting Data Collection
Laboratory is designed to collect corpora to support research,
development and evaluation in meeting recognition technologies.
It is equipped to look and sound like a conventional meeting
space. The data collection facility includes five Sony EV1-D30
video cameras, four of which have stationary views of a center
conference table (one view from each surrounding wall) with a
fixed focus and viewing angle, and an additional "floating"
camera which is used to focus on particular participants,
whiteboard or conference table depending on the meeting forum.
The data is captured in a NIST-internal file format. The video
data was extracted from the NIST format and encoded using the
MPEG-2 standard in NTSC format. Further information concerning
the video data parameters can found in the documentation
included with this corpus.</span></p>
<p class="MsoNormal" style="line-height: normal;"><span
style="font-size: 12pt;"><br>
</span></p>
<br>
<hr width="100%" size="2"><br>
<pre class="moz-signature" cols="72">Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>