[Corpora-List] Summary: Corpus of translated material

Nomi Guthmann nomi.guthmann at googlemail.com
Thu Mar 8 12:46:33 UTC 2007


Dear corpora list members,

Here is the summary of the various responses on corpora of translated
material (the main requirement was to know the source language of the
translations) :

The EUROPARL corpus
http://people.csail.mit.edu/koehn/publications/europarl/
In its current form, it does not include information of the source
language of the various texts, but I was told that its next release
will.

The English-Estonian and Estonian-English parallel corpus :
http://www.cl.ut.ee/korpused/paralleel/index.php?lang=en
It includes Estonian laws and EU legislation, and their translation.

The INTERSECT corpus
http://www.brighton.ac.uk/languages/contact/academicstaff/intersect.html
It includes English-French, English-German translations in several domains.

The COMPARA corpus
http://www.linguateca.pt/COMPARA/Welcome.html
It includes English and Portuguese bi-directional parallel texts.

The OPUS corpus
http://logos.uio.no/opus/
It is an open source parallel corpus in several languages.
Jörg Tiedemann also has a corpus of aligned movie subtitles, available
for research purposes only.

The TEC corpus
http://www.llc.manchester.ac.uk/Research/Centres/CentreforTranslationandInterculturalStudies/ResearchProgrammesPhDMPhil/TranslationEnglishCorpus/
A large corpus of translated English.

The Bible corpus
http://www.umiacs.umd.edu/~resnik/parallel/bible.html

Corina Forascu has a section of the TimeBank 1.2 (English) corpus
translated into Romanian.

JRC-Acquis multilingual parallel corpus
http://langtech.jrc.it/JRC-Acquis.html
A parallel corpus in several languages. The source languages in this
corpus are unknown.

The CroCo project
http://fr46.uni-saarland.de/croco
Corpus of German and English translations. The corpus is not available
for copyright reasons.

Many thanks for responses:
Chris Callison-Burch
Israel Cohen
Corina Forascu
Ana Frankenberg-Garcia
Hieu Hoang
Heiki Kaalep
Andrea Mulloni
Stella Neumann
Sebastian Padó
Raphael Salkie
Armin Schmidt
Harold Somers
Ralf Steinberger
Jörg Tiedemann


Noemie Guthmann
Translation and Interpreting Studies Department
Bar Ilan University



More information about the Corpora mailing list