<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><span style="font-size: 12pt;">-  </span><span

        style="font-size: 12pt;"><b><a href="#olympiad"><span

              style="color: blue;">LDC Sponsors a Student Group at 2011

              International Linguistics Olympiad</span></a></b>  -</span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><span style="font-size: 12pt;">-  </span><span

        style="font-size: 12pt;"><b><a href="#meta"><span style="color:

              blue;">LDC Receives META Prize from META-NET</span></a></b> 

        -</span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><i><span style="font-size: 12pt;">New

          publications:</span></i><span style="font-size: 12pt;"></span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><b><span style="font-size: 12pt;">-  </span><span

          style="font-size: 12pt;"><a href="#sre"><span style="color:

              blue;">2005 NIST Speaker Recognition Evaluation Test Data</span></a> 

          -</span></b></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><b><span style="font-size: 12pt;">-  </span><span

          style="font-size: 12pt;"><a href="#std"><span style="color:

              blue;">2006 NIST Spoken Term Detection Evaluation Set</span></a> 

          -</span></b></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><b><span style="font-size: 12pt;">-  </span></b><span

        style="font-size: 12pt;"><b><a href="#vace"><span style="color:

              blue;">NIST/USF Evaluation Resources for the VACE Program

              - Meeting Data Test Set Part 2</span></a></b><b>  -</b></span><span

        style="font-size: 12pt;"></span></p>

    <div class="MsoNormal" style="margin-bottom: 0.0001pt; text-align:

      center; line-height: normal;" align="center"><span

        style="font-size: 12pt;">

        <hr width="100%" align="center" size="2"> </span></div>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><span style="font-size: 12pt;"><br>

        <b><a name="olympiad">LDC Sponsors a Student Group at 2011

            International Linguistics Olympiad</a></b></span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">LDC is happy to support the 2011

        International Linguistics Olympiad <span class="msoDel"><del

            cite="mailto:dipersio" datetime="2011-07-13T18:30"> </del></span>by

        sponsoring a student team. The IOL is one of the twelve <a

          href="http://olympiads.win.tue.nl/"><span style="color: blue;">International

Science

            Olympiads</span></a> and is an annual event that brings

        together students from around the world to compete in

        linguistically–based challenges. This year’s competition takes

        place from July 24-30 at Carnegie Mellon University, Pittsburgh,

        PA  USA. Students do not need to have a background in

        linguistics in order to participate since they typically use

        analysis and deductive reasoning to solve the competition

        problems. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">Please visit the 2011 <a

          href="http://www.ioling.org/2011/"><span style="color: blue;">IOL

            website</span></a> for additional details. We wish good luck

        to all of the participants!</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"> </span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><b><a name="meta"><span style="font-size:

            12pt;">LDC Receives META Prize from META-NET</span></a></b><span

        style="font-size: 12pt;"></span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"> LDC was awarded a ‘2<sup>nd</sup> META

        Prize’ from META-NET ‘for outstanding long term commitment to

        the preparation and distribution of language resources and

        technologies.’</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"> The META Prize is awarded by META-NET

        to those who provide outstanding products or services that

        support the European Multilingual Information Society. <a

          href="http://www.meta-net.eu/mission"><span style="color:

            blue;">META-NET</span></a> is a Network of Excellence

        dedicated to fostering the technological foundations of a

        multilingual European information society. Several organizations

        were honored at this year’s META Forum in Budapest; LDC and <a

          href="http://www.elra.info/"><span style="color: blue;">ELRA</span></a>

        were both honored for supporting and developing language

        resources.<br>

        <br>

      </span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><b><span style="font-size: 12pt;">New

          Publications </span></b></p>

    <p class="MsoNormal" style="line-height: normal;"><a name="sre"><span

          style="font-size: 12pt;">(1)</span></a><span style="font-size:

        12pt;"> <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S04"><span

            style="color: blue;">2005 NIST Speaker Recognition

            Evaluation Test Data</span></a> was developed at LDC and

        NIST (National Institute of Standards and Technology). It

        consists of 525 hours of conversational telephone speech in

        English, Arabic, Mandarin Chinese, Russian and Spanish and

        associated English transcripts used as test data in the

        NIST-sponsored <a

          href="http://www.itl.nist.gov/iad/mig/tests/spk/2005/index.html"><span

            style="color: blue;">2005 Speaker Recognition Evaluation</span></a>

        (SRE). The ongoing series of SRE yearly evaluations conducted by

        NIST are intended to be of interest to researchers working on

        the general problem of text independent speaker recognition. To

        that end the evaluations are designed to be simple, to focus on

        core technology issues, to be fully supported and accessible. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The task of the 2005 SRE evaluation was

        speaker detection, that is, to determine whether a specified

        speaker is speaking during a given segment of conversational

        speech. The task was divided into 20 distinct and separate tests

        involving one of five training conditions and one of four test

        conditions. Further information about the task conditions is

        contained in the <a

href="http://www.itl.nist.gov/iad/mig/tests/sre/2005/sre-05_evalplan-v6.pdf"><span

            style="color: blue;">The NIST Year 2005 Speaker Recognition

            Evaluation Plan</span></a>. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The speech data consists of

        conversational telephone speech with "multi-channel" data

        collected by LDC simultaneously from a number of auxiliary

        microphones. The files are organized into two segments: 10

        second two-channel excerpts (continuous segments from single

        conversations that are estimated to contain approximately 10

        seconds of actual speech in the channel of interest) and 5

        minute two-channel conversations.</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The data are stored as 8-bit u-law

        speech signals in NIST SPHERE format. In addition to the

        standard header fields, the SPHERE header for each file contains

        some auxiliary information that includes the language of the

        conversation and whether the data was recorded over a telephone

        line.  English language word transcripts in .cmt format were

        produced using an automatic speech recognition system (ASR) with

        error rates in the range of 15-30%.</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"><br>

      </span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><span style="font-size: 12pt;">*</span></p>

    <p class="MsoNormal" style="line-height: normal;"><a name="std"><span

          style="font-size: 12pt;">(2)</span></a><span style="font-size:

        12pt;"> <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S03"><span

            style="color: blue;">2006 NIST Spoken Term Detection

            Evaluation Set</span></a> was compiled by researchers at

        NIST (National Institute of Standards and Technology) and

        contains approximately eighteen hours of  Arabic, Chinese and

        English broadcast news, English conversational telephone speech

        and English meeting room speech used in NIST's <a

          href="http://www.itl.nist.gov/iad/mig/tests/std/2006/index.html"><span

            style="color: blue;">2006 Spoken Term Detection (STD)

            evaluation</span></a>. The STD initiative is designed to

        facilitate research and development of technology for retrieving

        information from archives of speech data with the goals of

        exploring promising new ideas in spoken term detection,

        developing advanced technology incorporating these ideas,

        measuring the performance of this technology and establishing a

        community for the exchange of research results and technical

        insights. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The 2006 STD task was to find all of

        the occurrences of a specified "term" (a sequence of one or more

        words) in a given corpus of speech data. The evaluation was

        intended to develop technology for rapidly searching very large

        quantities of audio data. Although the evaluation used modest

        amounts of data, it was structured to simulate the very large

        data situation and to make it possible to extrapolate the speed

        measurements to much larger data sets. Therefore, systems were

        implemented in two phases: indexing and searching. In the

        indexing phase, the system processes the speech data without

        knowledge of the terms. In the searching phase, the system uses

        the terms, the index, and optionally the audio to detect term

        occurrences. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The evaluation corpus consists of three

        data genres: broadcast news (BNews), conversational telephone

        speech (CTS) and conference room meetings (CONFMTG). The

        broadcast news material was collected in 2003 and 2004  by <a

href="http://www.ldc.upenn.edu/DataSheets/Broadcast_Collection_System_DS.pdf"><span

            style="color: blue;">LDC's broadcast collection system </span></a>from

        the following sources: ABC (English), Aljazeera (Arabic), China

        Central TV (Chinese), CNN (English), CNBC (English), Dubaie TV

        (Arabic), New Tang Dynasty TV (Chinese), Public Radio

        International (English) and Radio Free Asia(Chinese). The CTS

        data was taken from the Switchboard data sets (e.g., <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S75"><span

            style="color: blue;">Switchboard-2 Phase 1 LDC98S75</span></a>,

        <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99S79"><span

            style="color: blue;">Switchboard-2 Phase 2 LDC99S79</span></a>)

        and the Fisher corpora (e.g., <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S13"><span

            style="color: blue;">Fisher English Training Speech Part 1

            LDC2004S13</span></a>), also collected by LDC. The

        conference room meeting material consists of goal-oriented,

        small group round table meetings and was collected in  2004 and

        2005 by NIST, the International Computer Science Institute

        (Berkeley, California), Carnegie Mellon University (Pittsburgh,

        PA), TNO (The Netherlands) and Virginia Polytechnic Institute

        and State University (Blacksburg, VA) as part of the <a

          href="http://corpus.amiproject.org/"><span style="color:

            blue;">AMI corpus project</span></a>. This evaluation corpus

        includes scoring software. It uses the inputs described in the

        STD Evaluation plan to complete the evaluation of a system. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">Each BNews recording is a 1-channel,

        pcm-encoded, 16Khz, SPHERE formatted file. CTS recordings are

        2-channel, u-law encoded, 8 Khz, SPHERE formatted files. The

        CONFMTG files contain a single recorded channel.</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"><br>

        <br>

      </span></p>

    <p class="MsoNormal" style="text-align: center; line-height:

      normal;" align="center"><span style="font-size: 12pt;">*</span></p>

    <p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height:

      normal;"><a name="vace"><span style="font-size: 12pt;">(3)</span></a><span

        style="font-size: 12pt;"> <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V04"><span

            style="color: blue;">NIST/USF Evaluation Resources for the

            VACE Program - Meeting Data Test Set Part 2</span></a> was

        developed by researchers at the <a

          href="http://www.cse.usf.edu/"><span style="color: blue;">Department

            of Computer Science and Engineering</span></a>, University

        of South Florida (USF), Tampa, Florida and the <a

          href="http://nist.gov/itl/iad/mig/"><span style="color: blue;">Multimodal

            Information Group</span></a> at the National Institute of

        Standards and Technology (NIST). It contains approximately

        thirteen hours of meeting room video data collected in 2001 and

        2002 at NIST's Meeting Data Collection Laboratory and used in

        the VACE (Video Analysis and Content Extraction) 2005

        evaluation. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">The VACE program was established to

        develop novel algorithms for automatic video content extraction,

        multi-modal fusion, and event understanding. During VACE Phases

        I and II, the program made significant progress in the automated

        detection and tracking of moving objects including faces, hands,

        people, vehicles and text in four primary video domains:

        broadcast news, meetings, street surveillance, and unmanned

        aerial vehicle motion imagery. Initial results were also

        obtained on automatic analysis of human activities and

        understanding of video sequences. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">Three performance evaluations were

        conducted under the auspices of the VACE program between 2004

        and 2007.  The 2005 evaluation was administered by USF in

        collaboration with NIST and guided by an advisory forum

        including the evaluation participants. </span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">LDC has previously released <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V01"><span

            style="color: blue;">NIST/USF Evaluation Resources for the

            VACE Program -- Meeting Data Training Set Part 1 LDC2011V01</span></a>,

        <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V02"><span

            style="color: blue;">NIST/USF Evaluation Resources for the

            VACE Program -- Meeting Data Training Set Part 2 LDC2011V02</span></a>

        and <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V03"><span

            style="color: blue;">NIST/USF Evaluation Resources for the

            VACE Program -- Meeting Data Test Set Part 1 LDC2011V03</span></a>.</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;">NIST's Meeting Data Collection

        Laboratory is designed to collect corpora to support research,

        development and evaluation in meeting recognition technologies.

        It is equipped to look and sound like a conventional meeting

        space. The data collection facility includes five Sony EV1-D30

        video cameras, four of which have stationary views of a center

        conference table (one view from each surrounding wall) with a

        fixed focus and viewing angle, and an additional "floating"

        camera which is used to focus on particular participants,

        whiteboard or conference table depending on the meeting forum.

        The data is captured in a NIST-internal file format. The video

        data was extracted from the NIST format and encoded using the

        MPEG-2 standard in NTSC format. Further information concerning

        the video data parameters can found in the documentation

        included with this corpus.</span></p>

    <p class="MsoNormal" style="line-height: normal;"><span

        style="font-size: 12pt;"><br>

      </span></p>

    <br>

    <hr width="100%" size="2"><br>

    <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>