Corpora: Parallel corpus

Tomaz Erjavec Tomaz.Erjavec at ijs.si
Wed Dec 20 18:38:51 UTC 2000


Yuliya Katsnelson writes:
 > I am looking for a parallel corpus (news, etc.) in English and
 > optimally, Eastern European languages.  The second-best scenario would

If you are happy with Slovene-English download or search at
http://nl.ijs.si/elan/#corpus

Tadeus has already mentioned the (double) TELRI CD-ROM, which has on
one CD the 'Republic' by Plato in over 20 languages (most east
europe), and on the other the MULTEXT-East corpus (6 ee languages, cca
300 kW per language, cf http://nl.ijs.si/ME/). I think the CD-ROM is
currently out of print, but check with http://www.telri.de/cdrom.html
Also, have a look at the TELRI Tractor archive, http://www.tractor.de/

Another possibility but a far cry from newspapers are the Linux
Documentation Project localisation files. They are copyleft, and easy
to get to. e.g. you can pick up the complete KDE desktop localisation
from http://i18n.kde.org/translation_archive/

Good luck,
Tomaz

--
Tomaz Erjavec                  | Dept. for Intelligent Systems E-8
email: tomaz.erjavec at ijs.si    | Jozef Stefan Institute
www:   http://nl.ijs.si/et/    | Jamova 39
fax:   (+386 1) 425-1038       | SI-1000 Ljubljana, Slovenia



More information about the Corpora mailing list