[Corpora-List] Russian Corpora at Russian Congress

P bI K O B___ B.B. (MOCKBA) rykov at narod.ru
Mon Mar 22 12:02:54 UTC 2004


   There was 2nd Russian Language congress last week

    http://www.philol.msu.ru/~rlc2004/en/inflet/index.php

    Soon all the reports will be on the site.

    I include the English annotation of mine.

    My main idea is the following. When I investigated the Brown University Corpus - I could see - what Americans REALLY read.

   Still here and there people try to select for there corpora (if they want them to represent the real speech activity in society) the best text samples. Actually they replace the real state of human communication pattern with their imagination the way it should be.

--------------


CORPUS OF TEXTS √ A NEW TYPE OF WORD UNITY

Rykov V.V.  rykov2000 at mail.ru

Key words: text corpus, corpus linguistics, general philology, speech medium, speech texture, writing tools, representativeness.

Now a ⌠text corpus■ or simply ⌠corpus■ is a frequently used term. Very often corpora  are sources of  many kinds of empirical and theoretical research. Nevertheless some important properties of corpora have to be properly defined. The fact is that  many people use this word in various and different ways. This leads to the wrong corpora usage and hence misinterpretation of research results.The purpose of this paper is to specify the meaning of the term ⌠text corpus■ and so to make clear the nature of the text corpus itself as a special kind of word unity. The standard definition contains four properties or qualities √ machine readable form, sampling and representativeness, finite size and standard reference. This paper discusses all these features using modern philological paradigms paying special attention to sampling and representativeness.
Sampling procedures following so called corpus design criteria should representatively reflect in the corpus texts the philological phenomena that was the purpose of the initial corpus design and later sampling. This is the central point of corpus definition under discussion.



--

  Regards Vladimir Rykov

PhD in Computational Linguistics

Personal web-site: rykov.narod.ru
English version:   www.blkbox.com/~gigawatt/rykov.html

--
27 марта - Открытие Фестиваля "Золотая Маска" в Цирке на Цветном бульваре.Билеты по тел.: 755-8335. http://goldenbilet.ru



More information about the Corpora mailing list