[Corpora-List] Looking for super large Russian corpus

Victor Zakharov vz1311 at mail.ru
Mon Oct 25 13:57:02 UTC 2004


-----Original Message-----

> http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
> genres, sources, etc.
>
> a substantial part of it is OCR'ed, and consequently some pieces exhibit
> problems, such as end-of-page hyphenation.  so you may have to do some quality
> control, depending on your needs.
>


A part of this digital library was tagged and is accesible as a normal corpus at the address:
http://www.aot.ru/search1.html

Victor Zakharov
Department of Mathematical Linguistics
St.Petersburg State University



More information about the Corpora mailing list