Arabic-L:LING:New LDC Arabic Multiple Translation Corpus
Dilworth Parkinson
dilworth_parkinson at byu.edu
Fri Oct 24 22:17:23 UTC 2003
------------------------------------------------------------------------
-
Arabic-L: Fri 24 Oct 2003
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:New LDC Arabic Multiple Translation Corpus
-------------------------Messages-----------------------------------
1)
Date: 24 Oct 2003
From:Linguistic Data Consortium <ldc at ldc.upenn.edu>
Subject:New LDC Arabic Multiple Translation Corpus
Multiple-Translation Arabic (MTA) Part 1 supports the development of
automatic means for evaluating translation quality. The corpus
contains 10 sets of human translations for a single set of Arabic
source materials. Additionally, translations from various
commercial-off-the-shelf-systems (COTS, including commercial Machine
Translation (MT) systems as well as MT systems available on the
Internet) are included. There are a total of 2 sets of COTS outputs,
and one output set from a TIDES 2002 MT Evaluation participant, which
is representative for the state-of-the-art research systems.
To determine whether automatic evaluation systems, such as BLEU, track
human assessment, human assessments on the two COTS outputs and the
TIDES research system were performed. The corpus includes the
assessment results for one of the two COTS systems, the assessment
result for the TIDES research system, and the specifications used for
conducting the assessments.
A total of 141 journalistic Arabic text files from the Xinhua and AFP
news services were selected for Multiple-Translation Arabic (MTA) Part
1. The corpus is available via ftp transfer.
For further information, including online documentation, please visit:
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T18
Institutions that have membership in the LDC during the 2003 Membership
Year will be able to receive this corpus free of charge. Nonmembers may
license this publication for $600.
If you need additional information before placing your order, or would
like to inquire about membership in the LDC, please send email to
<ldc at ldc.upenn.edu> or call (215) 573-1275.
*
------------------------------------------------------------------------
-------
Linguistic Data Consortium Phone:
(215) 573-1275
University of Pennsylvania
Fax: (215) 573-2175
3600 Market Street Suite 810 email:
ldc at ldc.upenn.edu
Philadelphia, PA 19104-2653 www:
http://www.ldc.upenn.edu
New
------------------------------------------------------------------------
--
End of Arabic-L: 24 Oct 2003
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3274 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20031024/992692a0/attachment-0001.bin>
More information about the Arabic-l
mailing list