[Corpora-List] New LDC Publications
Linguistic Data Consortium
ldc at ldc.upenn.edu
Mon Feb 27 19:32:37 UTC 2006
LDC2006T06
*ACE 2005 Multilingual Training Corpus
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06>
*
LDC2006S29*
Levantine Arabic QT Training Data Set 5, Speech
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29>
*
LDC2006T07*
Levantine Arabic QT Training Data Set 5, Transcripts
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07>
*
The Linguistic Data Consortium (LDC) is pleased to announce the
availability of three new publications.
------------------------------------------------------------------------
*
*
*New LDC Publications
*
(1) ACE 2005 Multilingual Training Corpus
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06>
contains the complete set of English, Arabic and Chinese training data
for the 2005 Automatic Content Extraction (ACE) technology evaluation.
The corpus consists of data of various types annotated for entities,
relations and events and was created by the Linguistic Data Consortium
with support from the ACE Program, with additional assistance from LDC.
The objective of the ACE program is to develop automatic content
extraction technology to support automatic processing of human language
in text form.
In November 2005, sites were evaluated on system performance in five
primary areas: the recognition of entities, values, temporal
expressions, relations, and events. Entity, relation and event mention
detection were also offered as diagnostic tasks. All tasks with the
exception of event tasks were performed for three languages, English,
Chinese and Arabic. Event tasks were evaluated in English and Chinese
only. The current publication comprises the official training data for
these evaluation tasks.
A complete description of the ACE 2005 Evaluation can be found on the
ACE Program website maintained by the National Institute of Standards
and Technology (NIST) <http://www.nist.gov/speech/tests/ace/>.
For more information about linguistic resources for the ACE Program,
including annotation guidelines, task definitions, free annotation tools
and other documentation, please visit LDC's ACE website.
<http://projects.ldc.upenn.edu/ace/>* * **
*
(2) Levantine Arabic QT Training Data Set 5, Speech
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29>
and (3) Levantine Arabic QT Training Data, Set 5, Transcripts
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07>
cover 1660 calls totaling approximately 250 hours of telephone
conversation in Levantine Arabic collected between 2003 and 2005. These
publications are the combination of four former training data sets:
LDC2004E21 and LDC2004E22, LDC2004E65 and LDC2004E66, LDC2005S07 and
LDC2005T03, and LDC2005S14 (Speech and Transcripts). The participants
represent a range of Levantine Arabic dialects. More than half of the
speakers are Lebanese; among the other speakers are Jordanian,
Palestinian and Syrian participants.
------------------------------------------------------------------------
If you need further information, or would like to inquire about
membership to the LDC, please email ldc at ldc.upenn.edu or call +1 215 573
1275.
--------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 ldc at ldc.upenn.edu
Philadelphia, PA 19104 USA http://www.ldc.upenn.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060227/b1432970/attachment.htm>
More information about the Corpora
mailing list