<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <div class="moz-text-html" lang="x-western">

      <div class="moz-text-html" lang="x-western">

        <div align="center"> </div>

        <p class="MsoNormal" align="center"><i style="">In this

            newsletter:</i></p>

        <div align="center"> </div>

        <p class="MsoNormal" align="center"><b>-  <a href="#scholar">Spring

              2012 LDC Data Scholarship Program - deadline approaching!</a> 

            -</b></p>

        <div align="center"> </div>

        <p class="MsoNormal" align="center"><b style="">-  </b><b><a

              href="#lsa">LDC Exhibiting at LSA 2012 Annual Meeting</a></b><b

            style="">  -<br>

          </b></p>

        <div align="center"> </div>

        <p class="MsoNormal" align="center"><b style="">-  </b><b><a

              href="#workshop">LDC Hosts Satellite Workshop at LSA 2012</a> 

            -</b></p>

        <div align="center"> </div>

        <div align="center"> </div>

        <p class="MsoNormal" align="center"><i style="">New

            publications:</i></p>

        <div align="center"> </div>

        <p class="MsoNormal" align="center">LDC2011S10<br>

          <b style="">- </b><b><a href="#2006">2006 NIST Speaker

              Recognition Evaluation Test Set Part 1</a></b><b style=""> 

            -<br>

          </b></p>

        <div align="center"> </div>

        <p class="MsoNormal" align="center">LDC2011S11<br>

          <b>-  <a href="#2008">2008 NIST Speaker Recognition

              Evaluation Supplemental Set</a>  -</b></p>

        <div class="MsoNormal" style="text-align: center;"

          align="center">

          <hr width="100%" align="center" size="2"></div>

        <p class="MsoNormal" align="center"> <a name="scholar"></a><b

            style="">Spring 2012 LDC Data Scholarship Program - deadline

            fast approaching!</b></p>

        <p class="MsoNormal">The deadline for the Spring 2012 LDC Data

          Scholarship Program is less than a month away!   Applications

          are being accepted through January 15, 2012.  The LDC Data

          Scholarship program provides university students with access

          to LDC data at no cost.  This program is open to students

          pursuing both undergraduate and graduate studies in an

          accredited college or university. LDC Data Scholarships are

          not restricted to any particular field of study; however,

          students must demonstrate a well-developed research agenda and

          a bona fide inability to pay.  <br>

          <br>

          Students will need to complete an application which consists

          of a data use proposal and letter of support from their

          adviser.  For further information on application materials and

          program rules, please visit the <a

            href="http://www.ldc.upenn.edu/About/scholarships.html">LDC

            Data Scholarship</a> page.  </p>

        <p class="MsoNormal">Students can email their applications to

          the <a href="mailto:datascholarships@ldc.upenn.edu">LDC Data

            Scholarship program</a>. Decisions will be sent by email

          from the same address.<br>

          <br>

        </p>

        <div align="center"><a name="lsa"></a><b style="">LDC Exhibiting

            at LSA 2012 Annual Meeting</b><br style="">

        </div>

        <p class="MsoNormal" style="margin-bottom: 0.0001pt;

          line-height: normal;">LDC looks forward to mingling with

          linguists and language specialists when we exhibit at the 86<sup>th</sup>

          Annual Meeting of the Linguistic Society of America (LSA). The

          main conference will be held over January 5-8, 2012 at the <a

            href="http://www.tourhiltonportland.com/">Portland, OR

            Hilton and Executive Tower</a> and the exhibit hall will be

          open from January 6-8th (limited hours on Sunday the 8<sup>th</sup>).

          Please stop by our display for news on what 2012 will hold for

          LDC and to receive some of our conference giveaways.</p>

        <p class="MsoNormal" style="margin-bottom: 0.0001pt;

          line-height: normal;">LSA 2012 will feature plenary talks on

          the following topics:<br>

        </p>

        <p class="MsoNormal" style="margin-bottom: 0.0001pt;

          line-height: normal;"> </p>

        <blockquote>

          <ul>

            <li>        Patrice Speeter Beddor (University of Michigan):

              "The Dynamics of Speech Perception: Constancy, Variation,

              and Change"</li>

          </ul>

          <ul>

            <li>         Dan Jurafsky (Stanford University): "Computing

              Meaning: Learning and Extracting Meaning from Text"</li>

          </ul>

          <ul>

            <li>        Ted Supalla (University of Rochester):

              "Rethinking the Emergence of Grammatical Structure in

              Signed Languages: New Evidence from Variation and

              Historical Change in American Sign Language"</li>

          </ul>

        </blockquote>

        For further information visit the <a

          href="http://www.lsadc.org/info/meet-annual.cfm">LSA Annual

          Meeting website</a>. If you would like to learn more about

        LDC’s conference preparations, please ‘like’ our <a

          href="http://www.facebook.com/ldc.upenn">Facebook</a> page.

        <p class="MsoNormal" style="margin-bottom: 0.0001pt;

          line-height: normal;">We hope to see you there!</p>

        <p class="MsoNormal"><b style=""> </b><br>

        </p>

        <div align="center"><b style="">New Publications<br>

            <br>

          </b></div>

        <p class="MsoNormal"><a name="2006sre"></a>(1) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S10">2006

            NIST Speaker Recognition Evaluation Test Set Part 1</a> was

          developed by LDC and National Institute of Standards and

          Technology (NIST).  It contains 437 hours of conversational

          telephone and microphone speech in English, Arabic, Bengali,

          Chinese, Farsi, Hindi, Korean, Russian, Spanish, Thai and Urdu

          and associated English transcripts used as test data in the

          NIST-sponsored<a

            href="http://www.itl.nist.gov/iad/mig/tests/spk/2006/index.html">

            2006 Speaker Recognition Evaluation (SRE)</a>. </p>

        <p class="MsoNormal">The ongoing series of SRE yearly

          evaluations conducted by NIST are intended to be of interest

          to researchers working on the general problem of text

          independent speaker recognition. The task of the 2006 SRE

          evaluation was speaker detection, that is, to determine

          whether a specified speaker is speaking during a given segment

          of conversational telephone speech. The task was divided into

          15 distinct and separate tests involving one of five training

          conditions and one of four test conditions. Further

          information about the test conditions and additional

          documentation is available at the <a

            href="http://www.itl.nist.gov/iad/mig/tests/spk/2006/index.html">NIST

            web site for the 2006 SRE</a> and within the <a

href="https://secure.ldc.upenn.edu/intranet/docs/LDC2011S10/sre-06_evalplan-v9.pdf">2006

SRE

            Evaluation Plan</a>.</p>

        <p class="MsoNormal">The speech data in this release was

          collected by LDC as part of the <a

            href="http://projects.ldc.upenn.edu/Mixer/">Mixer</a>

          project, in particular Mixer Phases 1, 2 and 3. The Mixer

          project supports the development of robust speaker recognition

          technology by providing carefully collected and audited speech

          from a large pool of speakers recorded simultaneously across

          numerous microphones and in different communicative situations

          and/or in multiple languages. The data is mostly English

          speech, but includes some speech in Arabic, Bengali, Chinese,

          Farsi, Hindi, Korean, Russian, Spanish, Thai and Urdu.</p>

        <p class="MsoNormal">The telephone speech segments are

          multi-channel data collected simultaneously from a number of

          auxiliary microphones. The files are organized into four

          types: two-channel excerpts of approximately 10 seconds,

          two-channel conversations of approximately 5 minutes,

          summed-channel conversations also of approximately 5 minutes

          and a two-channel conversation with the usual telephone speech

          replaced by auxiliary microphone data in the putative target

          speaker channel. The auxiliary microphone conversations are

          also of approximately five minutes in length.</p>

        <p class="MsoNormal">English language transcripts in .ctm format

          were produced using an automatic speech recognition (ASR)

          system.</p>

        <p class="MsoNormal"><br>

          <br>

        </p>

        <div align="center"><b style="">*</b></div>

        <p class="MsoNormal"><a name="2008sre"></a>(2) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S11">2008

            NIST Speaker Recognition Evaluation Supplemental Set</a> was

          developed by LDC and National Institute of Standards and

          Technology (NIST) and contains additional data distributed

          after the main 2008 Speaker Recognition Evaluation (SRE).

          Specifically, the corpus consists of 770 hours of English

          microphone speech along with transcripts and other materials

          used as supplemental data in the <a

            href="http://www.itl.nist.gov/iad/mig/tests/spk/2008/index.html">2008

            NIST Speaker Recognition Evaluation (SRE)</a> and in a

          follow-up evaluation to SRE08. </p>

        <p class="MsoNormal">The 2008 evaluation was distinguished from

          prior evaluations by including not only conversational

          telephone speech data but also conversational speech data of

          comparable duration recorded over a microphone channel

          involving an interview scenario. The follow-up evaluation

          focused on speaker detection in the context of conversational

          interview type speech and was designed to measure the

          performance of SRE08 systems in previously unexposed test

          segment channel conditions.</p>

        <p class="MsoNormal">LDC previously released the main 2008 NIST

          SRE Evaluation in three parts as <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S05">2008

NIST

            Speaker Recognition Evaluation Training Set Part 1

            LDC2011S05</a>, <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S07">2008

NIST

            Speaker Recognition Evaluation Training Set Part 2

            LDC2011S07</a> and <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S08">2008

NIST

            Speaker Recognition Evaluation Test Set LDC2011S08</a>.</p>

        <p class="MsoNormal">The speech data in this release was

          collected in 2007 by LDC at its <a

            href="http://www.ldc.upenn.edu/About/facilities.shtml">Human

            Subjects Data Collection Laboratories</a> in Philadelphia

          and by the <a href="http://www.icsi.berkeley.edu/">International

            Computer Science Institute</a> (ICSI) at the University of

          California, Berkeley. This collection was part of the <a

            href="http://projects.ldc.upenn.edu/Mixer/">Mixer 5</a>

          project, which was designed to support the development of

          robust speaker recognition technology by providing carefully

          collected and audited speech from a large pool of speakers

          recorded simultaneously across numerous microphones and in

          different communicative situations and/or in multiple

          languages. Mixer participants were native English and

          bilingual English speakers. The microphone speech in this

          corpus is in English and consists of approximately 3 minute

          and 30 minute interview excerpts. </p>

        <p class="MsoNormal">This supplemental data is split into four

          different parts which provide:</p>

        <ul>

          <li><span style="font-family: Symbol;"><span style=""><span

                  style="font: 7pt "Times New Roman";">       

                </span></span></span>new training data distributed to

            2008 SRE participants</li>

          <li><span style="font-family: Symbol;"><span style=""><span

                  style="font: 7pt "Times New Roman";">      

                </span></span></span>additional data distributed to

            participants in the 2008 SRE follow-up evaluation</li>

          <li><span style="font-family: Symbol;"><span style=""><span

                  style="font: 7pt "Times New Roman";">       

                </span></span></span>interviewer channel files for the

            2008 SRE main test (released after the evaluations)</li>

          <li><span style="font-family: Symbol;"><span style=""><span

                  style="font: 7pt "Times New Roman";">       

                </span></span></span>supplemental training data

            (released after the evaluations)</li>

        </ul>

        <p class="MsoNormal">English language transcripts in .cfm format

          were produced using an automatic speech recognition (ASR)

          system and are included for some, but not all, speech data.</p>

        <p class="MsoNormal"><br>

        </p>

        <hr width="100%" size="2">

        <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

      </div>

    </div>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>