<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <div class="moz-text-html" lang="x-western">

      <div align="center"><font face="Times New Roman, Times, serif"><i><span

              style="">New Publications:</span></i></font></div>

      <p class="MsoNormal" style="margin-bottom: 12pt; line-height:

        normal;" align="center"><font face="Times New Roman, Times,

          serif"><span style="">LDC2011S01</span><b><span style=""><br>

            </span></b><b><span style="">-  </span> <a href="#sre">2005

              NIST Speaker Recognition Evaluation Training Data</a></b><b><span

              style="">  -<br>

            </span></b></font><font face="Times New Roman, Times, serif"><b><span

              style=""></span></b><b><span style=""><br>

            </span></b><span style="">LDC2011V03</span><b><span style=""><br>

              -  </span> <a href="#vace">NIST/USF Evaluation Resources

              for the VACE Program - Meeting Data Test Set Part 3</a></b> <b><span

              style=""> -</span></b></font><font face="Times New Roman,

          Times, serif"><b><span style=""></span></b></font></p>

      <div align="center"> </div>

      <p class="MsoNormal" style="margin-bottom: 12pt; line-height:

        normal;" align="center"><font face="Times New Roman, Times,

          serif"><span style=""></span></font></p>

      <hr width="100%" size="2"><br>

      <div align="center"><font face="Times New Roman, Times, serif"><span

            style=""><b>New Publications</b></span><br>

        </font> </div>

      <div align="center"><font face="Times New Roman, Times, serif"><br>

          <span style=""></span></font> </div>

      <p class="MsoNormal"><font face="Times New Roman, Times, serif"><a

            name="sre"></a><span style="">(1) </span><a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011S01"><span

              style="">2005 NIST Speaker Recognition Evaluation Training

              Data</span></a><span style=""> was developed at LDC and

            NIST (National Insitute of Standards and Technology). It

            consists of <span style=""> </span>392 hours of

            conversational telephone speech in English, Arabic, Mandarin

            Chinese, Russian and Spanish and associated English

            transcripts used as training data in the NIST-sponsored </span><a

href="http://www.itl.nist.gov/iad/mig/tests/spk/2005/index.html"><span

              style="">2005 Speaker Recognition Evaluation</span></a><span

            style=""> (SRE). The ongoing series of SRE yearly

            evaluations conducted by NIST are intended to be of interest

            to researchers working on the general problem of text

            independent speaker recognition. To that end the evaluations

            are designed to be simple, to focus on core technology

            issues, to be fully supported and to be accessible to those

            wishing to participate. <br>

            <br>

            The task of the 2005 SRE evaluation was speaker detection,

            that is, to determine whether a specified speaker is

            speaking during a given segment of conversational speech.

            The task was divided into 20 distinct and separate tests

            involving one of five training conditions and one of four

            test conditions. <br>

            <br>

            The speech data consists of conversational telephone speech

            with "multi-channel" data collected simultaneously from a

            number of auxiliary microphones. The files are organized

            into two segments: 10 second two-channel excerpts

            (continuous segments from single conversations that are

            estimated to contain approximately 10 seconds of actual

            speech in the channel of interest) and 5 minute two-channel

            conversations.</span></font></p>

      <p class="MsoNormal"><font face="Times New Roman, Times, serif"><span

            style="">The speech files are stored as 8-bit u-law speech

            signals in separate SPHERE files. In addition to the

            standard header fields, the SPHERE header for each file

            contains some auxiliary information that includes the

            language of the conversation and whether the data was

            recorded over a telephone line.<br>

            <br>

            English language word transcripts in .cmt format were

            produced using an automatic speech recognition system (ASR)

            and contain error rates in the range of 15-30%. <br>

            <br>

          </span><span style=""><br>

          </span></font></p>

      <div align="center"><font face="Times New Roman, Times, serif"><b><span

              style=""> *</span></b><br>

          <span style=""></span></font> </div>

      <p class="MsoNormal"><a name="vace"></a><font face="Times New

          Roman, Times, serif"><span style="">(2</span></font><font

          face="Times New Roman, Times, serif"><span style="">) </span><a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V03"><span

              style="">NIST/USF Evaluation Resources for the VACE

              Program - Meeting Data Test Set Part 3</span></a><span

            style="">, Linguistic Data Consortium (LDC) catalog number

            LDC2011V03 and isbn 1-58563-579-0, was developed by

            researchers at the </span><a href="http://www.cse.usf.edu/"><span

              style="">Department of Computer Science and Engineering</span></a><span

            style="">, University of South Florida (USF), Tampa, Florida

            and the </span><a href="http://nist.gov/itl/iad/mig/"><span

              style="font-family: "Times New

              Roman","serif"; color: blue;">Multimodal

              Information Group</span></a><span style="font-family:

            "Times New Roman","serif";"> at the

            National Institute of Standards and Technology (NIST). It

            contains approximately eleven hours of meeting room video

            data collected in 2001 and 2002 at NIST's Meeting Data

            Collection Laboratory and annotated for the VACE (Video

            Analysis and Content Extraction) 2005 face, person and hand

            detection and tracking tasks.<br>

             <br>

            <span style="color: black;">The VACE program was established

              to develop novel algorithms for automatic video content

              extraction, multi-modal fusion, and event understanding.

              During VACE Phases I and II, the program made significant

              progress in the automated detection and tracking of moving

              objects including faces, hands, people, vehicles and text

              in four primary video domains: broadcast news, meetings,

              street surveillance, and unmanned aerial vehicle motion

              imagery. Initial results were also obtained on automatic

              analysis of human activities and understanding of video

              sequences. <br>

              <br>

              Three performance evaluations were conducted under the

              auspices of the VACE program between 2004 and 2007.  The

              2005 evaluation was administered by USF in collaboration

              with NIST and guided by an advisory forum including the

              evaluation participants. A summary of results of the

              evaluation can be found in the </span></span><a

            href="https://secure.ldc.upenn.edu/intranet/docs/VACE2005_report.pdf"><span

              style="">2005 VACE results and analysis paper</span></a><span

            style=""> included in this release. </span><span style=""><br>

            <br>

            NIST's Meeting Data Collection Laboratory is designed to

            collect corpora to support research, development and

            evaluation in meeting recognition technologies. It is

            equipped to look and sound like a conventional meeting

            space. The data collection facility includes five Sony

            EV1-D30 video cameras, four of which have stationary views

            of a center conference table (one view from each surrounding

            wall) with a fixed focus and viewing angle, and an

            additional "floating" camera which is used to focus on

            particular participants, whiteboard or conference table

            depending on the meeting forum. The data is captured in a

            NIST-internal file format. The video data was extracted from

            the NIST format and encoded using the MPEG-2 standard in

            NTSC format. Further information concerning the video data

            parameters can found in the documentation included with this

            corpus. <br>

          </span></font><br>

      </p>

      <br>

      <hr width="100%" size="2">

      <div align="center"><br>

        <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

      </div>

    </div>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>