Corpora: Parallel corpus
Tomaz Erjavec
Tomaz.Erjavec at ijs.si
Wed Dec 20 18:38:51 UTC 2000
Yuliya Katsnelson writes:
> I am looking for a parallel corpus (news, etc.) in English and
> optimally, Eastern European languages. The second-best scenario would
If you are happy with Slovene-English download or search at
http://nl.ijs.si/elan/#corpus
Tadeus has already mentioned the (double) TELRI CD-ROM, which has on
one CD the 'Republic' by Plato in over 20 languages (most east
europe), and on the other the MULTEXT-East corpus (6 ee languages, cca
300 kW per language, cf http://nl.ijs.si/ME/). I think the CD-ROM is
currently out of print, but check with http://www.telri.de/cdrom.html
Also, have a look at the TELRI Tractor archive, http://www.tractor.de/
Another possibility but a far cry from newspapers are the Linux
Documentation Project localisation files. They are copyleft, and easy
to get to. e.g. you can pick up the complete KDE desktop localisation
from http://i18n.kde.org/translation_archive/
Good luck,
Tomaz
--
Tomaz Erjavec | Dept. for Intelligent Systems E-8
email: tomaz.erjavec at ijs.si | Jozef Stefan Institute
www: http://nl.ijs.si/et/ | Jamova 39
fax: (+386 1) 425-1038 | SI-1000 Ljubljana, Slovenia
More information about the Corpora
mailing list