[Corpora-List] New LDC Publications

Linguistic Data Consortium ldc at ldc.upenn.edu
Mon Feb 27 19:32:37 UTC 2006


LDC2006T06
*ACE 2005 Multilingual Training Corpus 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06>
*
LDC2006S29*
Levantine Arabic QT Training Data Set 5, Speech 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29>
*
LDC2006T07*
Levantine Arabic QT Training Data Set 5, Transcripts 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07>

*
The Linguistic Data Consortium (LDC) is pleased to announce the 
availability of three new publications.

------------------------------------------------------------------------
*
*
*New LDC Publications

*
(1) ACE 2005 Multilingual Training Corpus 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06> 
contains the complete set of English, Arabic and Chinese training data 
for the 2005 Automatic Content Extraction (ACE) technology evaluation. 
The corpus consists of data of various types annotated for entities, 
relations and events and  was created by the Linguistic Data Consortium 
with support from the ACE Program, with additional assistance from LDC.  
The objective of the ACE program is to develop automatic content 
extraction technology to support automatic processing of human language 
in text form.

In November 2005, sites were evaluated on system performance in five 
primary areas: the recognition of entities, values, temporal 
expressions, relations, and events. Entity, relation and event mention 
detection were also offered as diagnostic tasks. All tasks with the 
exception of event tasks were performed for three languages, English, 
Chinese and Arabic. Event tasks were evaluated in English and Chinese 
only. The current publication comprises the official training data for 
these evaluation tasks.

A complete description of the ACE 2005 Evaluation can be found on the 
ACE Program website maintained by the National Institute of Standards 
and Technology (NIST) <http://www.nist.gov/speech/tests/ace/>.

For more information about linguistic resources for the ACE Program, 
including annotation guidelines, task definitions, free annotation tools 
and other documentation, please visit LDC's ACE website. 
<http://projects.ldc.upenn.edu/ace/>* * **


*

(2) Levantine Arabic QT Training Data Set 5, Speech 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006S29> 
and (3) Levantine Arabic QT Training Data, Set 5, Transcripts 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T07> 
cover 1660 calls totaling approximately 250 hours of telephone 
conversation in Levantine Arabic collected between 2003 and 2005. These 
publications are the combination of four former training data sets: 
LDC2004E21 and LDC2004E22, LDC2004E65 and LDC2004E66, LDC2005S07 and 
LDC2005T03, and LDC2005S14 (Speech and Transcripts). The participants 
represent a range of Levantine Arabic dialects. More than half of the 
speakers are Lebanese; among the other speakers are Jordanian, 
Palestinian and Syrian participants.         



------------------------------------------------------------------------

If you need further information, or would like to inquire about 
membership to the LDC, please email ldc at ldc.upenn.edu or call +1 215 573 
1275.



--------------------------------------------------------------------

Linguistic Data Consortium                     Phone: (215) 573-1275
University of Pennsylvania                       Fax: (215) 573-2175
3600 Market St., Suite 810                         ldc at ldc.upenn.edu
Philadelphia, PA 19104 USA                  http://www.ldc.upenn.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060227/b1432970/attachment.htm>


More information about the Corpora mailing list