wanted: list of Russian corpus at uu.se

Emilio Millan emillan at cd.com
Wed Jun 14 13:36:37 UTC 1995


Y.TSUJI wrote:
> Hello,
> I got in touch with the Slavic department of Uppsala University where
> a million word ftp"able Russian corpus is located. The administrator
> who grants access permission is now away on holidays and cannot be
> contacted. I wonder if someone could possibly send me the list of
> what they have there. I am saying this because if most of the
> stuff is already in my computer, it won't make sense paying $600.

Mr. Tsuji (and everyone else):

You can find a complete list of the texts in the Uppsala corpus in
Appendices 1 and 2 of Chastotnyj slovar' sovremennogo russkogo jazyka
[A Frequency Dictionary of Modern Russian].  Lennart Lo"nngren, editor.
Acta Universitatis Upsaliensis, _Studia Slavica Upsaliensia_ 32.

>>From the English-language summary:

> This frequency dictionary is based on a corpus of some 600 Russian
> texts, consisting of a total of a million running words (word tokens),
> equally divided between informative and literary prose.  The
> informative texts are from between 1985 and 1989, while the literary
> texts, whose vocabulary does not date as quickly, cover a longer
> period, 1960-88.  The corpus does not include poetry or drama.

> Within the given framework, considerable effort has been made to
> ensure as representative and varied a corpus as possible.  The
> informative texts are drawn from 25 different subject areas:
> economics, foreign affaris/foreign policy, ideology/domestic policy,
> party matters, Soviet society, social issues, defence, education, law,
> history, culture, linguistics, medicine/health care, psychology,
> environment/ecology, agriculture, engineering, information technology,
> space research, energy, biology, geology/geography, physics, chemistry
> and sport.  Certain areas which we felt to be more important are
> represented by a larger volume of texts.

> The literary half of the corpus comprises work by the following 40
> authors: Abramov, Ajtmatov, Astaf'ev, Baklanov, Bek, Belov, Bitov,
> Bondarev, Dubov, Ganin, Gladyshev, Granin, Grekova, Goncharov,
> Iskander, Kaverin, Kazakov, Kochnev, Kozhevnikova, Nagibin, Lichanov,
> Lidin, Paustovskij, Pogodin, Pristavkin, Troepol'skij, Rasputin,
> Shcherbakova, Simonov, Solouchin, Shmelev, Tendrjakov, Tokareva,
> Tolstaja, Trifonov, Vasil'ev, Vorob'ev, Zalygin and Zorin.  Here, too,
> there is unequal representation, with a larger amount of writing by
> the better-known authors.

> A detailed breakdown of the corpus by subject area (in the case of the
> informative texts) and author (as regards the literary texts) is given
> in Appendix 1.  An exhaustive list of all the texts making up the
> corpus is to be found in Appendix 2.

Hope this helps!

                                            Emilio


+-------------------------------------------------------------------------+
  Emilio Millan                                            emillan at cd.com
  Central Data Corporation
  1602 Newton Drive                                        (217) 366-9253
  Champaign, IL  61821-1098                            FAX (217) 359-6904
+-------------------------------------------------------------------------+



More information about the SEELANG mailing list