<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<div align="center"><small><font face="Courier New, Courier, monospace"><small><big>The
Linguistic Data Consortium
(LDC) would like to
announce the availability of three new corpora.<br>
<br>
</big></small></font></small></div>
<small><font face="Courier New, Courier, monospace"><small><br>
</small></font></small>
<hr size="2" width="100%"><small><font
face="Courier New, Courier, monospace"><small></small></font></small>
<p><small><font face="Courier New, Courier, monospace"><br>
(1) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T07">ACE
Time Normalization (TERN) 2004 English Training Data</a> contains
the English training data prepared for the 2004 Time Expression
Recognition and Normalization (TERN) Evaluation. The purpose of this
corpus and the TERN evaluation is to advance the state of the art in
the automatic recognition and normalization of natural language
temporal expressions. In most language contexts such expressions are
indexical. For example, with "Monday", "last week", or "three months
starting October 1", one must know the narrative reference time in
order to pinpoint the time interval being conveyed by the expression. <br>
</font></small></p>
<p><small><font face="Courier New, Courier, monospace">In addition, for
data exchange
purposes, it is essential that the
identified interval be rendered according to an established standard,
i.e., normalized. Accurate identification and normalization of temporal
expressions is in turn essential for the temporal reasoning being
demanded by advanced NLP applications such as question answering,
information extraction, and summarization. <small><br>
<br>
</small></font></small></p>
<p><small><font face="Courier New, Courier, monospace">(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T02">Arabic
Treebank: Part 1 v 3.0 (POS with full vocalization and syntactic
analysis)</a> is a re-release of LDC corpus, Arabic Treebank: Part 1 v
2.0, with the addition of improved morphological/part-of-speech
annotation including full vocalization and case endings. The corpus
supports the development of data-driven approaches to natural language
processing (NLP), human language technologies, automatic content
extraction, cross-lingual
information retrieval, information detection, and other forms of
linguistic research on Modern Standard Arabic.</font> </small></p>
<p><small><font face="Courier New, Courier, monospace">The project
targets the description of
a written Modern Standard
Arabic corpus from the Agence France Presse (AFP) newswire archives for
July-November 2000. This corpus includes 734 stories representing 145K
words.<small><big><br>
<br>
</big></small></font></small></p>
<p><small><font face="Courier New, Courier, monospace">(3) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T05">Multiple
Translation Arabic (MTA) Part 2</a> supports the development of
automatic means for evaluating translation quality. The corpus contains
4 sets of human translations and 2 sets of
commercial-off-the-shelf systems (COTS) outputs for a single set of
Arabic source materials. Additionally, there is one output set from a
TIDES 2003 MT Evaluation participant, which is representative for the
state-of-the-art research systems. </font></small></p>
<font face="Times New Roman"><small><font
face="Courier New, Courier, monospace">To see if automatic evaluation
systems,
such as BLEU, track human
assessment, the LDC performed human assessment on the two COTS outputs
and the TIDES research system. The corpus includes the assessment
results for one of the two COTS systems, the assessment result for the
TIDES research system, and the specifications used for conducting the
assessments. </font></small><br>
<br>
</font>
<hr size="2" width="100%"><font face="Times New Roman"><br>
</font>
<div align="center"><font face="Courier New"><small>If you need further
information, or would like to inquire about
membership to the LDC, please email <a class="moz-txt-link-abbreviated"
href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215
573 2175.<br>
<br>
<br>
</small></font></div>
<div align="center">--------------------------------------------------------------------<br>
</div>
<div align="center">
<pre class="moz-signature" cols="72">Linguistic Data Consortium Phone: (215) 573-1275
3600 Market Street Fax: (215) 573-2175
Suite 810 <a
class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 <a
class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
<br>
</body>
</html>