[Corpora-List] Manually annotated alignments
Linguistic Data Consortium
ldc at ldc.upenn.edu
Fri May 18 14:58:34 UTC 2007
Hi Lexi,
You may want to consider the following resources from the LDC to
determine if they meet your specific research needs::
Korean-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T26
Czech-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T25
Arabic-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T10
Chinese-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T02
Our full catalog is available at:
http://www.ldc.upenn.edu/Catalog/index.jsp
Please email directly you would like any additional information on the
above.
Ilya
Alexandra Birch wrote:
> Hi there,
>
> I am searching for manually annotated word/phrase alignments from
> parallel corpora. So far I have discovered:
>
> ACL2003 shared task
> http://www.cs.unt.edu/~rada/wpt/
> Romanian - English (Mihalcea & Pedersen 2003)
> English - French (Och & Ney 2000)
>
> ACL2005 shared task
> http://www.cse.unt.edu/~rada/wpt05/
> English - Inuktitut
> English - Hindi
>
> EPPS Word Alignment Trial and Test Set
> Spanish - English (500 sentences)
> http://gps-tsc.upc.es/veu/LR/epps_ensp_alignref.php3
>
> I will keep looking but I would appreciate it if anyone could
> inform me of other resources they know about.
>
> Thank you
>
> Lexi
--
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 ldc at ldc.upenn.edu
Philadelphia, PA 19104 USA http://www.ldc.upenn.edu
More information about the Corpora
mailing list