[Corpora-List] Looking for super large Russian corpus
Victor Zakharov
vz1311 at mail.ru
Mon Oct 25 13:57:02 UTC 2004
-----Original Message-----
> http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
> genres, sources, etc.
>
> a substantial part of it is OCR'ed, and consequently some pieces exhibit
> problems, such as end-of-page hyphenation. so you may have to do some quality
> control, depending on your needs.
>
A part of this digital library was tagged and is accesible as a normal corpus at the address:
http://www.aot.ru/search1.html
Victor Zakharov
Department of Mathematical Linguistics
St.Petersburg State University
More information about the Corpora
mailing list