[Corpora-List] Looking for super large Russian corpus

Roman Yangarber roman at cs.nyu.edu
Sat Oct 23 20:16:21 UTC 2004


    > Date: 	Sat, 23 Oct 2004 14:36:47 +0400 (MSD)
    > From: "P bI K O B___  B.B. (MOCKBA)" <rykov at narod.ru>
    >
    > I am looking  for super large Russian  corpus to use in my research project.
    > Corpus doesn’t require any tagging, it can be Russian text only.

http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
genres, sources, etc.

a substantial part of it is OCR'ed, and consequently some pieces exhibit
problems, such as end-of-page hyphenation.  so you may have to do some quality
control, depending on your needs.

--
Roman Yangarber
______________________________     __________________________________________
                                   Research Assistant Professor
       voice +1 (212) 998-3264     Department of Computer Science
         fax +1 (212) 995-4123     Courant Institute of Mathematical Sciences
                                   New York University
              roman at cs.nyu.edu     715 Broadway, 7th Floor
          www.cs.nyu.edu/roman     New York, NY 10003-6806
______________________________     __________________________________________
      mobile: +358 50 4668 383     in Finland
______________________________     __________________________________________



More information about the Corpora mailing list