<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <div align="center"><font face="Times New Roman, Times, serif"><small><big><span

              style=""><i>In this newsletter:</i></span></big><b><big><span

                style=""></span></big></b></small><br>

        <small><b><big><span style=""> </span></big></b></small><br>

        <small><b><big><span style=""> </span></big></b></small>-   <a

          href="#scholar"> <b>Spring 2011 LDC Data Scholarship

            Recipients</b>   -</a><br>

        <br>

        <small><b><big><span style=""> </span></big></b><b><big><span

                style=""></span><span style=""></span></big></b></small><b><big><span

              style="">-   </span><span style=""></span></big><a

            href="#nea"> LDC at NEALLT 2011</a></b><b><big><span

              style="">-</span></big></b><small><b><big><span style=""></span></big></b><b><big><span

                style=""></span></big></b></small><br>

        <small> <b><big><span style=""></span></big></b><big><span

              style=""></span></big></small><small><big><span style=""></span></big></small><br>

        <small><big><span style=""><i> New publications:</i></span></big><b><big><span

                style=""></span></big></b></small><br>

        <small><b><big><span style=""> </span></big></b></small><br>

        <small><b><big><span style=""> </span><span style=""></span><big><span

                  style=""></span></big></big></b><b><big><big><span

                  style="">-   </span></big></big></b></small> <a

          href="#matr"> <b>2008/2010 NIST Metrics for Machine

            Translation (MetricsMaTr) GALE Evaluation Set</b></a><b><big><span

              style="">   -</span></big></b><small><b><big><big><span

                  style=""></span></big></big></b><big><b><big><span

                  style=""></span></big></b></big></small><br>

        <small><big><b><big> <span style=""></span></big></b></big></small><br>

        <small><big><b><big> </big></b><b><big><span style=""></span><span

                  style=""> </span></big></b></big></small><b><big><span

              style="">-   </span></big> <a href="#vace"> NIST/USF

            Evaluation Resources for the VACE Program  – Meeting Data

            Training Set Part 1</a></b><b><big><span style=""></span><span

              style="">   -</span></big></b><small><big><b><big><span

                  style=""></span></big></b><b><big><span style=""></span></big></b></big></small></font><br>

      <font face="Times New Roman, Times, serif"><small><b><big><span

                style=""></span></big></b></small><b><big><span style=""></span></big></b></font><font

        face="Times New Roman, Times, serif"><b><big><span style=""><br>

            </span></big></b><span style=""></span><span style=""></span></font>

      <hr width="100%" size="2"><font face="Times New Roman, Times,

        serif"><br>

        <a name="scholar"></a><b style=""><span style="">Spring 2011 LDC

            Data Scholarship Recipients</span></b><span style=""></span><br>

        <span style=""></span></font> </div>

    <font face="Times New Roman, Times, serif"><span style=""> </span><br>

      <span style=""> LDC is pleased to announce the student recipients

        of the Spring 2011 LDC Data Scholarship program!  The LDC Data

        Scholarship program provides university students with access to

        LDC data at no-cost. Students were asked to complete an

        application which consisted of a proposal describing their

        intended use of the data, as well as a letter of support from

        their thesis adviser. LDC received many solid applications from

        both undergraduate and graduate students attending universities

        across the globe.  After careful deliberation, we have chosen

        eight proposals to support.   These students will receive

        no-cost copies of LDC data:</span></font>

    <blockquote> <font face="Times New Roman, Times, serif"><span

          style="">Roberto Aceves - Monterrey Institute of Technology

          and Superior Studies, ITESM (Mexico), graduate student,

          Computer Science.  Roberto has been awarded a copy of <i>Speech

            in Noisy Environments (SPINE2) Part 1 Audio</i> and <i>Transcripts</i>

          (LDC2001S04 and LDC2001T05) for his research in automatic

          speech recognition in noisy environments.<br>

          <br>

        </span></font><font face="Times New Roman, Times, serif"><span

          style=""> Daniel Escobar - Monterrey Institute of Technology

          and Superior Studies, ITESM (Mexico), graduate student,

          Mechatronics and Automation.  Daniel has been awarded  a copy

          of <i><br>

            Switchboard-2 Phase I (LDC98S75) </i>and<i> 2003 NIST

            Spearker Recognition Evaluation (LDC2010S03)</i>for

          designing a parallel joint factor analysis architecture for a

          speaker verification system.<br>

          <br>

          Erhan Guven - The George Washington University (USA), graduate

          student, Computer Science.  Erhan has been awarded a copy of <i>Emotional

            Prosody (LDC2002S28)</i> for his work in classifying

          emotions based on features in spectrograms.<br>

          <br>

          Anup Kolya - Jadavpur University (India), graduate student,

          Computer Science and Engineering.  Anup has been awarded a

          copy of <i>ACE 2005 English SpatialML Annotations

            (LDC2008T03), ACE Time Normalization (TERN) 2004 English

            Evaluation Data V1.0 (LDC2010T18), </i>and<i> ACE Time

            Normalization (TERN) 2004 English Training Data v 1.0

            (LDC2005T07) </i>for his research in temporal information

          extraction. <br>

          <br>

          Benjamín Martínez Elizalde - Monterrey Institute of Technology

          and Superior Studies, ITESM (Mexico), graduate student,

          Computer Science.  Benjamín has been awarded a copy of <i>Switchboard-1

            Release 2 (LDC97S62) </i>and<i> 2002 NIST Spearker

            Recognition Evaluation (LDC2004S04)</i><i></i> <i> </i>to

          support his research in speaker verification modeling.<br>

          <br>

          Hanan Waer - Newcastle University (UK), graduate student,

          Educational and Applied Linguistics.  Hanan has been awarded a

          copy of <i>CALLHOME Egyptian Arabic Transcripts (LDC97T19)</i>,

          <i>CALLHOME Egyptian Arabic Transcripts Supplement

            (LDC2002T38)</i>, and <i>Egyptian Colloquial Arabic Lexicon

            (LDC99L22)</i> for her research in comparing Arabic/English

          code switching in everyday Arabic conversation and academic

          discourse.<br>

          <br>

          Muhua Zhu - Northeastern University (China), graduate student,

          Natural Language Processing.  Muhua has been  awarded a copy

          of <i>Chinese Treebank 7.0</i> (LDC2010T07) to support the

          development of a high-accuracy Chinese parser.<br>

          <br>

          Vignesh Kalaiselvan, Ganapathy Raman Kasi, Preetham Samue,

          Ramsrinivas Anantharamakrishnan, and Sathyanarayan Jeevan -

          Amrita Vishwa Vidyapeetham University (India), undergraduate

          students, Electronics and Communication Engineering -  the

          group has been awarded <i>CALLHOME Speech, Transcripts, </i>and<i>

            Lexicon</i> in Egyptian Arabic and German for their research

          in deriving robust features for multilingual acoustic

          modeling.</span></font> </blockquote>

    <font face="Times New Roman, Times, serif"><span style=""></span><br>

      <span style=""> Please join us in congratulating our student

        winners!   The next LDC Data Scholarship program is scheduled

        for the Fall 2011 semester. <br>

        <br>

      </span> <span style=""> </span><br>

      <span style=""> </span></font>

    <div align="center"><font face="Times New Roman, Times, serif"><a

          name="nea"></a><b><span style="">LDC at NEALLT 2011<br>

          </span></b><span style=""></span><br>

        <span style=""></span></font> </div>

    <font face="Times New Roman, Times, serif"><span style=""> LDC will

        be exhibiting at the upcoming NEALLT (North East Association for

        Language Learning Technology) conference, which will be held at

        the University of Pennsylvania from 1-3 April 2011. <a

          href="http://neallt.org/"><span style="color: blue;">NEALLT</span></a>

        is the regional chapter of the International Association for

        Language Learning Technology and works to improve language

        instruction through the use of technology.</span><br>

      <span style=""> </span><br>

      <span style=""> How resources developed and distributed by LDC can

        aid language education will be discussed by LDC’s Dr Mohamed

        Maamouri in the presentation “Incorporating Resources and New

        Technologies in Language Education” on Saturday, April 2

        (Session 9: 4.00-4.20 pm, Cohen G17). That presentation will

        highlight the LDC <a href="http://projects.ldc.upenn.edu/art/"><span

            style="color: blue;">Arabic Reading Enhancement Tool</span></a>,

        designed to support the development of reading skills for

        learning Arabic as a first and second language.</span><br>

      <span style=""> </span><br>

      <span style=""> </span><span style=""></span><br>

    </font>

    <div align="center"><font face="Times New Roman, Times, serif"><b><span

            style="">New Publications</span></b></font><br>

      <font face="Times New Roman, Times, serif"><b><span style=""></span></b></font></div>

    <font face="Times New Roman, Times, serif"><b><span style=""><br>

        </span></b><b><span style=""></span></b><a name="matr"></a><span

        style="">(1) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011T05">2008/2010

          NIST Metrics for Machine Translation (MetricsMaTr) GALE

          Evaluation Set (LDC2011T05)</a> is a package containing source

        data, reference translations, machine translations and

        associated human judgments used in the NIST 2008 and 2010

        MetricsMaTr evaluations. The package was compiled by researchers

        at NIST, making use of Arabic and Chinese broadcast, newswire

        and web data and reference translations collected and developed

        by LDC for Phase 2 and Phase 2.5 of the DARPA <a

          href="http://projects.ldc.upenn.edu/gale/index.html"><span

            style="color: blue;">GALE </span></a>program. <br>

        <br>

      </span><span style=""><a

          href="http://www.itl.nist.gov/iad/mig/tests/metricsmatr/"><span

            style="color: blue;">NIST MetricsMaTr</span></a> is a series

        of research challenge events for machine translation (MT)

        metrology, promoting the development of innovative MT metrics

        that correlate highly with human assessments of MT quality.

        Participants submit their metrics to NIST (National Institute of

        Standards and Technology). NIST runs those metrics on certain

        held-back test data for which it has human assessments measuring

        quality and then calculates correlations between the automatic

        metric scores and the human assessments. Specifically, the goals

        of MetricsMATR are: to inform other MT technology evaluation

        campaigns and conferences with regard to improved metrology; to

        establish an infrastructure that encourages the development of

        innovative metrics; to build a diverse community that will bring

        new perspectives to MT metrology research; and to provide a

        forum for MT metrology discussion and for establishing future

        directions of MT metrology. <br>

        <br>

      </span><span style="">The first MetricsMaTr challenge was held in

        <a

          href="http://www.itl.nist.gov/iad/mig/tests/metricsmatr/2008/"><span

            style="color: blue;">2008</span></a>; the development data

        from the 2008 program is available from LDC, <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009T05"><span

            style="color: blue;">2008 NIST Metrics for Machine

            Translation (MetricsMATR08) Development Data LDC2009T05</span></a>.

        The <a

href="https://secure.ldc.upenn.edu/intranet/docs/NISTMetricsMaTr10EvalPlan.pdf"><span

            style="color: blue;">MetricsMaTr10 evaluation plan</span></a>

        is included in this release.<br>

        <br>

      </span><span style=""> This release contains 149 documents with

        corresponding reference translations (Arabic-to-English and

        Chinese-to-English), system translations and human assessments.

        The human assessments include the following: Adequacy7 (a

        7-point scale for judging the meaning of a system translation

        with respect to the reference translation); Adequacy Yes/No

        (whether the given system segment meant essentially the same as

        the reference translation); Preference (the judges' preference

        between two candidate translations when compared to a human

        reference translation); and HTER (Human Targeted Error Rate,

        human edits to a system translation to have the same meaning as

        a reference translation). </span><br>

      <span style=""> </span><br>

      <span style=""> </span><span style=""><br>

        <br>

      </span></font>

    <div align="center"><font face="Times New Roman, Times, serif"><span

          style=""><b>* </b></span></font><br>

    </div>

    <font face="Times New Roman, Times, serif"><span style=""> </span><br>

      <span style=""> </span><br>

      <span style=""> </span><a name="vace"></a><span style="">(2) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011V01">NIST/USF

          Evaluation Resources for the VACE Program  – Meeting Data

          Training Set Part 1 (LDC2011V01)</a> was developed by

        researchers at the <a href="http://www.cse.usf.edu/"><span

            style="color: blue;">Department of Computer Science and

            Engineering</span></a>, University of South Florida (USF),

        Tampa, Florida and the <a href="http://nist.gov/itl/iad/mig/"><span

            style="color: blue;">Multimodal Information Group</span></a>

        at the National Institute of Standards and Technology (NIST). It

        contains approximately fifteen hours of meeting room video data

        collected in 2001 and 2002 at NIST's Meeting Data Collection

        Laboratory and annotated for the VACE (Video Analysis and

        Content Extraction Program) 2005 face, person and hand detection

        and tracking tasks.<br>

        <br>

      </span> <span style="">The VACE program was established to

        develop novel algorithms for automatic video content extraction,

        multi-modal fusion, and event understanding. During VACE Phases

        I and II, the program made significant progress in the automated

        detection and tracking of moving objects including faces, hands,

        people, vehicles and text in four primary video domains:

        broadcast news, meetings, street surveillance, and unmanned

        aerial vehicle motion imagery. Initial results were also

        obtained on automatic analysis of human activities and

        understanding of video sequences.</span><br>

      <span style=""> </span><br>

      <span style=""> Three performance evaluations were conducted under

        the auspices of the VACE program between 2004 and 2007.  The

        2005 evaluation was administered by USF in collaboration with

        NIST and guided by an advisory forum including the evaluation

        participants. <br>

        <br>

      </span> <span style="">NIST's Meeting Data Collection Laboratory

        is designed to collect corpora to support research, development

        and evaluation in meeting recognition technologies. It is

        equipped to look and sound like a conventional meeting space.

        The data collection facility includes five Sony EV1-D30 video

        cameras, four of which have stationary views of a center

        conference table with a fixed focus and viewing angle, and an

        additional "floating" camera which is used to focus on

        particular participants, whiteboard or conference table

        depending on the meeting forum. The data is captured in a

        NIST-internal file format. The video data was extracted from the

        NIST format and encoded using the MPEG-2 standard in NTSC

        format. </span><br>

      <span style=""> </span><br>

      <span style=""> </span><br>

    </font>

    <hr width="100%" size="2"><br>

    <div align="center">

      <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

    </div>

    <p class="MsoNormal"><font face="Times New Roman"><br>

      </font></p>

    <pre class="moz-signature" cols="72">-- 

</pre>

  </body>

</html>