Arabic-L:LING:New LDC Arabic Multiple Translation Corpus

Dilworth Parkinson dilworth_parkinson at byu.edu
Fri Oct 24 22:17:23 UTC 2003


------------------------------------------------------------------------ 
-
Arabic-L: Fri 24 Oct  2003
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:New LDC Arabic Multiple Translation Corpus

-------------------------Messages-----------------------------------
1)
Date: 24 Oct  2003
From:Linguistic Data Consortium <ldc at ldc.upenn.edu>
Subject:New LDC Arabic Multiple Translation Corpus

Multiple-Translation Arabic (MTA) Part 1 supports the development of  
automatic means for evaluating translation quality.  The corpus  
contains 10 sets of human translations for a single set of Arabic  
source materials.  Additionally, translations from various  
commercial-off-the-shelf-systems (COTS, including commercial Machine  
Translation (MT) systems as well as MT systems available on the  
Internet) are included. There are a total of 2 sets of COTS outputs,  
and one output set from a TIDES 2002 MT Evaluation participant, which  
is representative for the state-of-the-art research systems.

To determine whether automatic evaluation systems, such as BLEU, track  
human assessment,  human assessments on the two COTS outputs and the  
TIDES research system were performed. The corpus includes the  
assessment results for one of the two COTS systems, the assessment  
result for the TIDES research system, and the specifications used for  
conducting the assessments.

A total of 141 journalistic Arabic text files from the Xinhua and AFP  
news services were selected for Multiple-Translation Arabic (MTA) Part  
1.   The corpus is available via ftp transfer.

For further information, including online documentation, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T18

Institutions that have membership in the LDC during the 2003 Membership  
Year will be able to receive this corpus free of charge. Nonmembers may  
license this publication for $600.


If you need additional information before placing your order, or would  
like to inquire about membership in the LDC, please send email to  
<ldc at ldc.upenn.edu> or call (215) 573-1275.

                                                       *

------------------------------------------------------------------------ 
-------
Linguistic Data Consortium                                       Phone:  
(215) 573-1275
University of Pennsylvania                                           
Fax:   (215) 573-2175
3600 Market Street  Suite 810                              email:  
ldc at ldc.upenn.edu
Philadelphia, PA 19104-2653                     www:  
http://www.ldc.upenn.edu
New

------------------------------------------------------------------------ 
--
End of Arabic-L:  24 Oct  2003
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3274 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20031024/992692a0/attachment-0001.bin>


More information about the Arabic-l mailing list