<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

<div align="center"><small><font face="Courier New, Courier, monospace"><small><big>The

Linguistic Data Consortium

(LDC) would like to

announce the availability of three new corpora.<br>

<br>

</big></small></font></small></div>

<small><font face="Courier New, Courier, monospace"><small><br>

</small></font></small>

<hr size="2" width="100%"><small><font

 face="Courier New, Courier, monospace"><small></small></font></small>

<p><small><font face="Courier New, Courier, monospace"><br>

(1)  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T07">ACE

Time Normalization (TERN) 2004 English Training Data</a> contains

the English training data prepared for the 2004 Time Expression

Recognition and Normalization (TERN) Evaluation.  The purpose of this

corpus and the TERN evaluation is to advance the state of the art in

the automatic recognition and normalization of natural language

temporal expressions. In most language contexts such expressions are

indexical. For example, with "Monday", "last week", or "three months

starting October 1", one must know the narrative reference time in

order to pinpoint the time interval being conveyed by the expression. <br>

</font></small></p>

<p><small><font face="Courier New, Courier, monospace">In addition, for

data exchange

purposes, it is essential that the

identified interval be rendered according to an established standard,

i.e., normalized. Accurate identification and normalization of temporal

expressions is in turn essential for the temporal reasoning being

demanded by advanced NLP applications such as question answering,

information extraction, and summarization.  <small><br>

<br>

</small></font></small></p>

<p><small><font face="Courier New, Courier, monospace">(2)  <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T02">Arabic

Treebank: Part 1 v 3.0 (POS with full vocalization and syntactic

analysis)</a> is a re-release of LDC corpus, Arabic Treebank: Part 1 v

2.0, with the addition of improved morphological/part-of-speech

annotation including full vocalization and case endings.  The corpus

supports the development of data-driven approaches to natural language

processing (NLP), human language technologies, automatic content

extraction, cross-lingual

information retrieval, information detection, and other forms of

linguistic research on Modern Standard Arabic.</font> </small></p>

<p><small><font face="Courier New, Courier, monospace">The project

targets the description of

a written Modern Standard

Arabic corpus from the Agence France Presse (AFP) newswire archives for

July-November 2000. This corpus includes 734 stories representing 145K

words.<small><big><br>

<br>

</big></small></font></small></p>

<p><small><font face="Courier New, Courier, monospace">(3) <a

 href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T05">Multiple

Translation Arabic (MTA) Part 2</a> supports the development of

automatic means for evaluating translation quality. The corpus contains

4 sets of human translations and 2 sets of

commercial-off-the-shelf systems (COTS) outputs for a single set of

Arabic source materials.  Additionally, there is one output set from a

TIDES 2003 MT Evaluation participant, which is representative for the

state-of-the-art research systems. </font></small></p>

<font face="Times New Roman"><small><font

 face="Courier New, Courier, monospace">To see if automatic evaluation

systems,

such as BLEU, track human

assessment, the LDC performed human assessment on the two COTS outputs

and the TIDES research system. The corpus includes the assessment

results for one of the two COTS systems, the assessment result for the

TIDES research system, and the specifications used for conducting the

assessments.  </font></small><br>

<br>

</font>

<hr size="2" width="100%"><font face="Times New Roman"><br>

</font>

<div align="center"><font face="Courier New"><small>If you need further

information, or would like to inquire about

membership to the LDC, please email <a class="moz-txt-link-abbreviated"

 href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> or call +1 215

573 2175.<br>

<br>

<br>

</small></font></div>

<div align="center">--------------------------------------------------------------------<br>

</div>

<div align="center">

<pre class="moz-signature" cols="72">Linguistic Data Consortium                     Phone: (215) 573-1275

3600 Market Street                             Fax:   (215) 573-2175

Suite 810                                          <a

 class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104                      <a

 class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

</div>

<br>

</body>

</html>