<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
<div class="moz-text-html" lang="x-western">
<p class="MsoNormal" align="center"><b style="">- </b><b> <a
href="#scholar">Spring 2012 LDC Data Scholarship Recipients!</a></b><b
style=""> -</b></p>
<p class="MsoNormal" align="center"><i>New publications:</i></p>
<p class="MsoNormal" align="center">LDC2012S03<br>
<b>- <a href="#dass">Digital Archive of Southern Speech
(DASS)</a></b><b> -</b></p>
<p class="MsoNormal" align="center">LDC2012T01<br>
<b>- <a href="#modes">ModeS TimeBank 1.0</a></b><b> -</b></p>
<hr width="100%" size="2">
<p class="MsoNormal" align="center"><b><br>
</b> <a name="scholar"></a><b style="">Spring 2012 LDC Data
Scholarship Recipients!</b></p>
<p class="MsoNormal"> LDC is pleased to announce the student
recipients of the Spring 2012 LDC Data Scholarship program!
This program provides university students with access to LDC
data at no-cost. Students were asked to complete an application
which consisted of a proposal describing their intended use of
the data, as well as a letter of support from their thesis
adviser. We received many solid applications and <span style=""></span>have
chosen six proposals to support. The following students will
receive no-cost copies of LDC data: </p>
<blockquote>
<p class="MsoNormal"><span style="font-size: 12pt; font-family:
"Times New Roman","serif";"></span>Zainab
Ali Khalaf<span style=""> </span>– University of Science,
Malaysia (Malaysia), graduate student, Computer Science.
Zainab has been awarded a copy of <i style="">1996 English
Broadcast News Transcripts (HUB4)</i> (LDC97T22) for her
work in spoken document retrieval.<span style=""> </span></p>
<p class="MsoNormal">Daniel Jettka – Trinity College Dublin
(Ireland), graduate student, Centre for Language &
Communication Studies.<span style=""> </span>Daniel has been
awarded <span style=""> </span>copies of <i style="">Penn
Discourse Treebank Version 2.0</i> (LDC2008T05) and <i
style="">RST Discourse Treebank</i> (LDC2002T07) for his
work in anaphora resolution.</p>
<p class="MsoNormal">Olga Nickolaevna Ladoshko - National
Technical University of Ukraine “KPI” (Ukraine), graduate
student, Acoustics and Acoustoelectronics. Olga has been
awarded <span style=""></span>copies of <span style=""> </span><i
style="">NTIMT</i> (LDC93S2) and <i style="">STC-TIMIT 1.0</i>
(LDC2008S03) for her research in automatic speech recognition
for Ukrainian.</p>
<p class="MsoNormal">Ming Yang, Xiaoxiao Ma, and Jiajia Huang –
Wuhan University (China), graduate students, Computer Science.<span
style=""> </span>Ming, Xiaoxiao, and Jiajia have been
awarded <span style=""> </span>copies of <i style="">ACE
Time Normalization (TERN) 2004 English Training Data</i> <i
style="">v 1.0</i> (LDC2005T07) and <i style="">GALE Phase
1 Chinese Broadcast News Parallel Text – Part 1</i>
(LDC2007T23) for their work in summarization and data mining.</p>
<p class="MsoNormal">Daria Vazhenina – University of Aizu
(Japan), graduate student, Human Interface Lab.<span style="">
</span>Daria has been awarded a copy of <i style="">2005
Spring NIST Rich Transcription (RT-05S) Evaluation Set</i>
(LDC2011S06) for her work in speaker diarization.</p>
<p class="MsoNormal">Tanina Zappone - University of Rome “La
Sapienza” (Italy), graduate student, Oriental Studies.<span
style=""> </span>Tanina has been awarded a copy of <i
style="">Chinese Treebank 7.0</i> (LDC2010T07) for her work
in China’s political communications.</p>
</blockquote>
<p class="MsoNormal">Please join us in congratulating our student
recipients! The next LDC Data Scholarship program is scheduled
for the Fall 2012 semester. </p>
<p class="MsoNormal"> <br>
</p>
<div align="center"><b>New publications</b></div>
<p class="MsoNormal"> <a name="dass"></a>(1) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012S03">Digital
Archive of Southern Speech (DASS)</a> was developed by the
University of Georgia. It is a subset of the <a
href="http://www.lap.uga.edu/Site/LAGS.html">Linguistic Atlas
of the Gulf States</a> (LAGS), which is in turn part of the <a
href="http://www.lap.uga.edu/">Linguist Atlas Project</a>
(LAP). DASS contains approximately 370 hours of English speech
data from 30 female speakers and 34 male speakers in .wav format
and in .mp3 format, along with associated metadata about the
speakers and the recordings and maps in .jpeg format relating to
the recording locations.</p>
<p class="MsoNormal">LAP consists of a set of survey research
projects about the words and pronunciation of everyday American
English, the largest project of its kind in the United States.
Interviews with thousands of native speakers across the country
have been carried out since 1929. LAGS surveyed the everyday
speech of Georgia, Tennessee, Florida, Alabama, Mississippi,
Arkansas, Louisiana, and Texas in a series of 914 audio-taped
interviews conducted from 1968-1983. Interviews average
approximately six hours in length; the systematic LAGS tape
archive amounts to 5500 hours of sound recordings. DASS is a
collection of 64 interviews from LAGS selected to cover a range
of speech across the region and to represent multiple education
levels and ethnic backgrounds. </p>
<p class="MsoNormal">Also included in this release is a version of
the LICHEN software developed at the University of Oulu,
Finland. LICHEN allows users to browse and search through the
audio data in a more advanced fashion using a graphical
interface. </p>
<div align="center"> * </div>
<p class="MsoNormal"> <a name="modes"></a>(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T01">ModeS
TimeBank 1.0</a> was developed by researchers at <a
href="http://www.upm.es/internacional">Technical University of
Madrid</a> and <a href="http://www.barcelonamedia.org/en">Barcelona
Media</a> and is a corpus of Modern Spanish (17th and 18th
centuries) annotated with temporal and event information
according to TimeML mark-up and annotated with spatial
information following the SpatialML scheme.</p>
<p class="MsoNormal">TimeML (Pustejovsky et al., 2005) is a
specification language for annotating eventualities and time
expressions in natural language as well as the temporal
relations among them, thus facilitating the task of extraction,
representation and exchange of temporal information. SpatialML
(Mani et al., 2008) is a specification language for annotating
and normalizing spatial expressions by means of geographic
coordinates.</p>
<p class="MsoNormal">ModeS TimeBank 1.0 contains 102 documents
reporting a sea-crossing cruise by a ship called La Princesa,
which took place from December 1768 to April 1769. There exist
copious logbooks from that period that not only provide
information about shipping routes, but also contain valuable
data concerning information flows, commercial agents and social
networks. </p>
<p class="MsoNormal">All text is encoded in UTF-8. The data in
ModeS TimeBank 1.0 has been tokenized, POS-tagged, and annotated
with space, time and event information according to the TimeML
and SpatialML specification schemes. </p>
<p class="MsoNormal">ModeS TimeBank 1.0 is distributed via web
download.<span style=""> </span></p>
<p class="MsoNormal">Non-members may request this data by
completing a copy of the <a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf">LDC
User
Agreement for Non-Members</a>. The agreement can be faxed +1
215 573 2175 or scanned and emailed to this address. This data
is available at no charge.<br>
</p>
<hr width="100%" size="2"> <br>
<pre class="moz-signature" cols="72">Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a>
</pre>
</div>
</body>
</html>