[Corpora-List] Manually annotated alignments

Linguistic Data Consortium ldc at ldc.upenn.edu
Fri May 18 14:58:34 UTC 2007


Hi Lexi,

You may want to consider the following resources from the LDC to 
determine if they meet your specific research needs::

Korean-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T26

Czech-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T25

Arabic-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T10

Chinese-English
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T02

Our full catalog is available at:

http://www.ldc.upenn.edu/Catalog/index.jsp

Please email directly you would like any additional information on the 
above.

Ilya


Alexandra Birch wrote:

> Hi there,
>
> I am searching for manually annotated word/phrase alignments from
> parallel corpora. So far I have discovered:
>
> ACL2003 shared task
> http://www.cs.unt.edu/~rada/wpt/
> Romanian - English (Mihalcea & Pedersen 2003)
> English - French (Och & Ney 2000)
>
> ACL2005 shared task
> http://www.cse.unt.edu/~rada/wpt05/
> English - Inuktitut
> English - Hindi
>
> EPPS Word Alignment Trial and Test Set
> Spanish - English (500 sentences)
> http://gps-tsc.upc.es/veu/LR/epps_ensp_alignref.php3
>
> I will keep looking but  I would appreciate it if anyone could
> inform me of other resources they know about.
>
> Thank you
>
> Lexi


-- 


Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium                  Phone: 1 (215) 573-1275
University of Pennsylvania                    Fax: 1 (215) 573-2175
3600 Market St., Suite 810                        ldc at ldc.upenn.edu
Philadelphia, PA 19104 USA                 http://www.ldc.upenn.edu



More information about the Corpora mailing list