[Corpora-List] Looking for super large Russian corpus
Roman Yangarber
roman at cs.nyu.edu
Sat Oct 23 20:16:21 UTC 2004
> Date: Sat, 23 Oct 2004 14:36:47 +0400 (MSD)
> From: "P bI K O B___ B.B. (MOCKBA)" <rykov at narod.ru>
>
> I am looking for super large Russian corpus to use in my research project.
> Corpus doesnt require any tagging, it can be Russian text only.
http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
genres, sources, etc.
a substantial part of it is OCR'ed, and consequently some pieces exhibit
problems, such as end-of-page hyphenation. so you may have to do some quality
control, depending on your needs.
--
Roman Yangarber
______________________________ __________________________________________
Research Assistant Professor
voice +1 (212) 998-3264 Department of Computer Science
fax +1 (212) 995-4123 Courant Institute of Mathematical Sciences
New York University
roman at cs.nyu.edu 715 Broadway, 7th Floor
www.cs.nyu.edu/roman New York, NY 10003-6806
______________________________ __________________________________________
mobile: +358 50 4668 383 in Finland
______________________________ __________________________________________
More information about the Corpora
mailing list