<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <div class="moz-text-html" lang="x-western">

      <p class="MsoNormal" align="center"><b style="">-  </b><b> <a

            href="#scholar">Spring 2012 LDC Data Scholarship Recipients!</a></b><b

          style="">  -</b></p>

      <p class="MsoNormal" align="center"><i>New publications:</i></p>

      <p class="MsoNormal" align="center">LDC2012S03<br>

        <b>-   <a href="#dass">Digital Archive of Southern Speech

            (DASS)</a></b><b>  -</b></p>

      <p class="MsoNormal" align="center">LDC2012T01<br>

        <b>-   <a href="#modes">ModeS TimeBank 1.0</a></b><b>  -</b></p>

      <hr width="100%" size="2">

      <p class="MsoNormal" align="center"><b><br>

        </b> <a name="scholar"></a><b style="">Spring 2012 LDC Data

          Scholarship Recipients!</b></p>

      <p class="MsoNormal"> LDC is pleased to announce the student

        recipients of the Spring 2012 LDC Data Scholarship program! 

        This program provides university students with access to LDC

        data at no-cost. Students were asked to complete an application

        which consisted of a proposal describing their intended use of

        the data, as well as a letter of support from their thesis

        adviser. We received many solid applications and <span style=""></span>have

        chosen six proposals to support.   The following students will

        receive no-cost copies of LDC data: </p>

      <blockquote>

        <p class="MsoNormal"><span style="font-size: 12pt; font-family:

            "Times New Roman","serif";"></span>Zainab

          Ali Khalaf<span style="">  </span>– University of Science,

          Malaysia (Malaysia), graduate student, Computer Science.

          Zainab has been awarded a copy of <i style="">1996 English

            Broadcast News Transcripts (HUB4)</i> (LDC97T22) for her

          work in spoken document retrieval.<span style="">  </span></p>

        <p class="MsoNormal">Daniel Jettka – Trinity College Dublin

          (Ireland), graduate student, Centre for Language &

          Communication Studies.<span style="">  </span>Daniel has been

          awarded <span style=""> </span>copies of <i style="">Penn

            Discourse Treebank Version 2.0</i> (LDC2008T05) and <i

            style="">RST Discourse Treebank</i> (LDC2002T07) for his

          work in anaphora resolution.</p>

        <p class="MsoNormal">Olga Nickolaevna Ladoshko - National

          Technical University of Ukraine “KPI” (Ukraine), graduate

          student, Acoustics and Acoustoelectronics. Olga has been

          awarded <span style=""></span>copies of <span style=""> </span><i

            style="">NTIMT</i> (LDC93S2) and <i style="">STC-TIMIT 1.0</i>

          (LDC2008S03) for her research in automatic speech recognition

          for Ukrainian.</p>

        <p class="MsoNormal">Ming Yang, Xiaoxiao Ma, and Jiajia Huang –

          Wuhan University (China), graduate students, Computer Science.<span

            style="">  </span>Ming, Xiaoxiao, and Jiajia have been

          awarded <span style=""> </span>copies of <i style="">ACE

            Time Normalization (TERN) 2004 English Training Data</i> <i

            style="">v 1.0</i> (LDC2005T07) and <i style="">GALE Phase

            1 Chinese Broadcast News Parallel Text – Part 1</i>

          (LDC2007T23) for their work in summarization and data mining.</p>

        <p class="MsoNormal">Daria Vazhenina – University of Aizu

          (Japan), graduate student, Human Interface Lab.<span style=""> 

          </span>Daria has been awarded a copy of <i style="">2005

            Spring NIST Rich Transcription (RT-05S) Evaluation Set</i>

          (LDC2011S06) for her work in speaker diarization.</p>

        <p class="MsoNormal">Tanina Zappone - University of Rome “La

          Sapienza” (Italy), graduate student, Oriental Studies.<span

            style="">  </span>Tanina has been awarded a copy of <i

            style="">Chinese Treebank 7.0</i> (LDC2010T07) for her work

          in China’s political communications.</p>

      </blockquote>

      <p class="MsoNormal">Please join us in congratulating our student

        recipients!   The next LDC Data Scholarship program is scheduled

        for the Fall 2012 semester. </p>

      <p class="MsoNormal"> <br>

      </p>

      <div align="center"><b>New publications</b></div>

      <p class="MsoNormal"> <a name="dass"></a>(1) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012S03">Digital

          Archive of Southern Speech (DASS)</a> was developed by the

        University of Georgia. It is a subset of the <a

          href="http://www.lap.uga.edu/Site/LAGS.html">Linguistic Atlas

          of the Gulf States</a> (LAGS), which is in turn part of the <a

          href="http://www.lap.uga.edu/">Linguist Atlas Project</a>

        (LAP). DASS contains approximately 370 hours of English speech

        data from 30 female speakers and 34 male speakers in .wav format

        and in .mp3 format, along with associated metadata about the

        speakers and the recordings and maps in .jpeg format relating to

        the recording locations.</p>

      <p class="MsoNormal">LAP consists of a set of survey research

        projects about the words and pronunciation of everyday American

        English, the largest project of its kind in the United States.

        Interviews with thousands of native speakers across the country

        have been carried out since 1929. LAGS surveyed the everyday

        speech of Georgia, Tennessee, Florida, Alabama, Mississippi,

        Arkansas, Louisiana, and Texas in a series of 914 audio-taped

        interviews conducted from 1968-1983. Interviews average

        approximately six hours in length; the systematic LAGS tape

        archive amounts to 5500 hours of sound recordings. DASS is a

        collection of 64 interviews from LAGS selected to cover a range

        of speech across the region and to represent multiple education

        levels and ethnic backgrounds. </p>

      <p class="MsoNormal">Also included in this release is a version of

        the LICHEN software developed at the University of Oulu,

        Finland. LICHEN allows users to browse and search through the

        audio data in a more advanced fashion using a graphical

        interface. </p>

      <div align="center"> * </div>

      <p class="MsoNormal"> <a name="modes"></a>(2) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T01">ModeS

          TimeBank 1.0</a> was developed by researchers at <a

          href="http://www.upm.es/internacional">Technical University of

          Madrid</a> and <a href="http://www.barcelonamedia.org/en">Barcelona

          Media</a> and is a corpus of Modern Spanish (17th and 18th

        centuries) annotated with temporal and event information

        according to TimeML mark-up and annotated with spatial

        information following the SpatialML scheme.</p>

      <p class="MsoNormal">TimeML (Pustejovsky et al., 2005) is a

        specification language for annotating eventualities and time

        expressions in natural language as well as the temporal

        relations among them, thus facilitating the task of extraction,

        representation and exchange of temporal information. SpatialML

        (Mani et al., 2008) is a specification language for annotating

        and normalizing spatial expressions by means of geographic

        coordinates.</p>

      <p class="MsoNormal">ModeS TimeBank 1.0 contains 102 documents

        reporting a sea-crossing cruise by a ship called La Princesa,

        which took place from December 1768 to April 1769. There exist

        copious logbooks from that period that not only provide

        information about shipping routes, but also contain valuable

        data concerning information flows, commercial agents and social

        networks. </p>

      <p class="MsoNormal">All text is encoded in UTF-8. The data in

        ModeS TimeBank 1.0 has been tokenized, POS-tagged, and annotated

        with space, time and event information according to the TimeML

        and SpatialML specification schemes. </p>

      <p class="MsoNormal">ModeS TimeBank 1.0 is distributed via web

        download.<span style="">  </span></p>

      <p class="MsoNormal">Non-members may request this data by

        completing a copy of the <a

href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf">LDC

User

          Agreement for Non-Members</a>.  The agreement can be faxed +1

        215 573 2175 or scanned and emailed to this address.  This data

        is available at no charge.<br>

      </p>

      <hr width="100%" size="2"> <br>

      <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>

</pre>

    </div>

  </body>

</html>