[Corpora-List] Summary: Parallel texts for MT evaluation

D Elliott debe at comp.leeds.ac.uk
Fri Jun 13 12:00:20 UTC 2003


Dear all,

Thanks to everyone who responded to my request for parallel texts with
good quality human translations, suitable for my MT evaluation research.

Here is a summary of resources available from the web:

INTERSECT corpus
FRENCH-ENGLISH:
Le Monde, instructions for domestic appliances, technical and academic
texts and others
GERMAN-ENGLISH
Company home pages, news items, EU documents and more
http://www.brighton.ac.uk/edusport/languages/html/intersect.html
Thanks to Professor Raphael Salkie, University of Brighton, UK

Proceedings of the European Parliament
MANY EUROPEAN LANGUAGES INTO ENGLISH
http://www.isi.edu/~koehn/publications/europarl/
Thanks to Susana Sotelo Docío, Universidade de Santiago de Compostela

OPUS corpus
ENGLISH SOURCE TEXTS translated into French, Spanish, Swedish, German, and
Japanese.
Jörg Tiedemann and Lars Nygaaard compiled the documentation of the office
package OpenOffice[1] and the PHP[2] manual. The resulting corpus is OPUS
- an open source parallel corpus.
http://logos.uio.no/opus/
[1] http://www.openoffice.org
[2] http://www.php.net
Thanks to Susana Sotelo Docío, Universidade de Santiago de Compostela

UN declarations of human rights
Many languages
http://www.unhchr.ch/udhr/index.htm
Thanks to Paul McNamee, Johns Hopkins University and Ella Earp-Lynch,
SpeechWorks International

Centre for Disease Control (USA)
Chinese, French, Japanese, Spanish info on SARS and many other medical
topics
http://www.cdc.gov/
http://www.cdc.gov/ncidod/sars/languages.htm
Thanks to Paul McNamee, Johns Hopkins University

Debian free software community:
Technical translations
http://www.debian.org/international/
Thanks to Paul McNamee, Johns Hopkins University

Official journal of the EU
Freely downloadable European legislation in many languages
http://europa.eu.int
Thanks to Paul McNamee, Johns Hopkins University, Terence Lewis (Language
Engineer) and Koen.Kerremans

Public registry of the Council of the EU
PDF files in various languages. Translations indicate the source
language.
http://register.consilium.eu.int/
Thanks to John Beaven

COMPARA corpus
English-Portuguese/Portuguese-English
http://www.linguateca.pt/COMPARA/
Thanks to Dr Ana Frankenberg-Garcia,Instituto Superior de Línguas e
Administração, Lisboa, Portugal

The Universal Declaration of Human Rights
UNESCO's website also has most
documents available translated into Spanish, French and frequently into
Russian, Chinese and Arabic

French Foreign Ministry's magazine - Label France:
French into various languages
http://www.france.diplomatie.fr/label_france/index.html
Thanks to Jeremy Whistle, University College Northampton

ELRA newsletter
In French and English
www.elda.fr
Thanks to Jeff Allen

Multilingual articles:
English version:
http://www.multilingual.com/allen51.htm
French translation:
http://www.editionscle.com/bol/presse/article1/allen-mltc51-fr.htm
English version: http://www.multilingual.com/allen53.htm
French translation:
http://www.editionscle.com/bol/presse/article2/allen-mltc53-fr.htm
Thanks to Jeff Allen

Haitian Creole version:
http://hometown.aol.com/mit2haiti/JA-HC-kr.htm
English version:
http://hometown.aol.com/mit2haiti/JA-HC-eng.htm
Thanks to Jeff Allen

MIT2 website
Marilyn Mason Bio & Publication List:
http://hometown.aol.com/marilinc/Index3.html
Creole Links Page:
http://hometown.aol.com/mit2haiti/Index4.html
The Creole Clearinghouse:
http://hometown.aol.com/CreoleCH/Index6.html
Thanks to Jeff Allen


--
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3436818
Email: debe at comp.leeds.ac.uk
Website (to be expanded):
http://www.comp.leeds.ac.uk/cgi-bin/sis/ext/rs_pub.cgi/debe.html?cmd=displayrs
***************************************************



More information about the Corpora mailing list