Corpora: when does a subcorpus become a corpus

Sampo Nevalainen samponev at cc.joensuu.fi
Fri Jan 4 13:36:55 UTC 2002


At 16:05 4.1.2002 +0300, P bI K O B_          B.B. wrote:
>   I would like to make a little example. There was a report here about
> distribution of the meanings of the verb "moch'". This Russian verb has
> two main meanings - "can" and "may".
>   Would my distributions based on the corpora like Corpus of Russian
> Proverbs, Political metaphors or Russian newspapers have any value - or -
> in other words - tell us smth about Russian language as a whole? I think
> that the proof of it can give the texts of general carefully compiled
> balanced represantative corpus of Russian language.

I will also try to make it more concrete. In my opinion, the value of your
distributions from any single corpus is pretty low (from the point of view
of the language as a whole), unless you get more evidence from other
corpora. If all of the three corpora you mentioned above show a similar
tendency, you may become more convinced about the feature as a general
feature of the language "as a whole", but you may still be wrong. In this
light there is, obviously, no big difference whether you handle the corpora
as separate subcorpora or as one bunch that you might then call a "general
corpus"...

sampo



( : ============================================= : )

Sampo Nevalainen, M.A.
Researcher
University of Joensuu
Savonlinna School of Translation Studies
P.O.Box 48
FIN-57101 Savonlinna
FINLAND

tel     +358-15-511 70      (operator)
         +358-15-511 7704
fax     +358-15-515 096
email   samponev at cc.joensuu.fi
http://www.joensuu.fi/slnkvl/



More information about the Corpora mailing list