[Corpora-List] News from LDC
Linguistic Data Consortium
ldc at ldc.upenn.edu
Fri Feb 24 17:18:31 UTC 2012
*- **Spring 2012 LDC Data Scholarship Recipients! <#scholar>** -*
/New publications:/
LDC2012S03
*- Digital Archive of Southern Speech (DASS) <#dass>** -*
LDC2012T01
*- ModeS TimeBank 1.0 <#modes>** -*
------------------------------------------------------------------------
*
* *Spring 2012 LDC Data Scholarship Recipients!*
LDC is pleased to announce the student recipients of the Spring 2012 LDC
Data Scholarship program! This program provides university students
with access to LDC data at no-cost. Students were asked to complete an
application which consisted of a proposal describing their intended use
of the data, as well as a letter of support from their thesis adviser.
We received many solid applications and have chosen six proposals to
support. The following students will receive no-cost copies of LDC data:
Zainab Ali Khalaf-- University of Science, Malaysia (Malaysia),
graduate student, Computer Science. Zainab has been awarded a copy
of /1996 English Broadcast News Transcripts (HUB4)/ (LDC97T22) for
her work in spoken document retrieval.
Daniel Jettka -- Trinity College Dublin (Ireland), graduate student,
Centre for Language & Communication Studies.Daniel has been awarded
copies of /Penn Discourse Treebank Version 2.0/ (LDC2008T05) and
/RST Discourse Treebank/ (LDC2002T07) for his work in anaphora
resolution.
Olga Nickolaevna Ladoshko - National Technical University of Ukraine
"KPI" (Ukraine), graduate student, Acoustics and Acoustoelectronics.
Olga has been awarded copies of /NTIMT/ (LDC93S2) and /STC-TIMIT
1.0/ (LDC2008S03) for her research in automatic speech recognition
for Ukrainian.
Ming Yang, Xiaoxiao Ma, and Jiajia Huang -- Wuhan University
(China), graduate students, Computer Science.Ming, Xiaoxiao, and
Jiajia have been awarded copies of /ACE Time Normalization (TERN)
2004 English Training Data/ /v 1.0/ (LDC2005T07) and /GALE Phase 1
Chinese Broadcast News Parallel Text -- Part 1/ (LDC2007T23) for
their work in summarization and data mining.
Daria Vazhenina -- University of Aizu (Japan), graduate student,
Human Interface Lab.Daria has been awarded a copy of /2005 Spring
NIST Rich Transcription (RT-05S) Evaluation Set/ (LDC2011S06) for
her work in speaker diarization.
Tanina Zappone - University of Rome "La Sapienza" (Italy), graduate
student, Oriental Studies.Tanina has been awarded a copy of /Chinese
Treebank 7.0/ (LDC2010T07) for her work in China's political
communications.
Please join us in congratulating our student recipients! The next LDC
Data Scholarship program is scheduled for the Fall 2012 semester.
*New publications*
(1) Digital Archive of Southern Speech (DASS)
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012S03>
was developed by the University of Georgia. It is a subset of the
Linguistic Atlas of the Gulf States
<http://www.lap.uga.edu/Site/LAGS.html> (LAGS), which is in turn part of
the Linguist Atlas Project <http://www.lap.uga.edu/> (LAP). DASS
contains approximately 370 hours of English speech data from 30 female
speakers and 34 male speakers in .wav format and in .mp3 format, along
with associated metadata about the speakers and the recordings and maps
in .jpeg format relating to the recording locations.
LAP consists of a set of survey research projects about the words and
pronunciation of everyday American English, the largest project of its
kind in the United States. Interviews with thousands of native speakers
across the country have been carried out since 1929. LAGS surveyed the
everyday speech of Georgia, Tennessee, Florida, Alabama, Mississippi,
Arkansas, Louisiana, and Texas in a series of 914 audio-taped interviews
conducted from 1968-1983. Interviews average approximately six hours in
length; the systematic LAGS tape archive amounts to 5500 hours of sound
recordings. DASS is a collection of 64 interviews from LAGS selected to
cover a range of speech across the region and to represent multiple
education levels and ethnic backgrounds.
Also included in this release is a version of the LICHEN software
developed at the University of Oulu, Finland. LICHEN allows users to
browse and search through the audio data in a more advanced fashion
using a graphical interface.
*
(2) ModeS TimeBank 1.0
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2012T01>
was developed by researchers at Technical University of Madrid
<http://www.upm.es/internacional> and Barcelona Media
<http://www.barcelonamedia.org/en> and is a corpus of Modern Spanish
(17th and 18th centuries) annotated with temporal and event information
according to TimeML mark-up and annotated with spatial information
following the SpatialML scheme.
TimeML (Pustejovsky et al., 2005) is a specification language for
annotating eventualities and time expressions in natural language as
well as the temporal relations among them, thus facilitating the task of
extraction, representation and exchange of temporal information.
SpatialML (Mani et al., 2008) is a specification language for annotating
and normalizing spatial expressions by means of geographic coordinates.
ModeS TimeBank 1.0 contains 102 documents reporting a sea-crossing
cruise by a ship called La Princesa, which took place from December 1768
to April 1769. There exist copious logbooks from that period that not
only provide information about shipping routes, but also contain
valuable data concerning information flows, commercial agents and social
networks.
All text is encoded in UTF-8. The data in ModeS TimeBank 1.0 has been
tokenized, POS-tagged, and annotated with space, time and event
information according to the TimeML and SpatialML specification schemes.
ModeS TimeBank 1.0 is distributed via web download.
Non-members may request this data by completing a copy of the LDC User
Agreement for Non-Members
<http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf>.
The agreement can be faxed +1 215 573 2175 or scanned and emailed to
this address. This data is available at no charge.
------------------------------------------------------------------------
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810ldc at ldc.upenn.edu
Philadelphia, PA 19104 USAhttp://www.ldc.upenn.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120224/58eec6ba/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list