[Corpora-List] Legal aspects of compiling corpora

Sampo Nevalainen samponev at cc.joensuu.fi
Thu Jun 19 08:26:31 UTC 2003


Hi,

>then we will face another problem of comparing approaches and techniques,
>if each of us use different corpora (without any possibility to share it
>with others because of the legal aspects) then no comparison will be possible.

My comment is clearly out of topic, but I could not resist... This is one
thing I have not fully understood ever since I was irrevocably taken with
CL. Many text books on CL give an idea that a corpus should have a finite
size and be "a standard reference" (as McEnery and Wilson put it in "Corpus
Linguistics" 1996). In my humble opinion, this is rather unnatural, as,
after all, we are studying an open, ever-growing, dynamic, lively organism
(unless we are interested in "dead" languages). From this viewpoint, if we
are going to generalize anything about a language, at least I would have
more confidence in results that are based on several different corpora
rather than on a detailed description of a certain corpus. Just as weather
forecasts or climate studies -- the more measurement points are available
the more reliable they are. (Clearly, one practical solution is a kind of
"monitor corpus" -- or the Internet. I understand that the cruciality of
this question depends a lot on the purpose(s) of the corpus and the aim(s)
of the researcher, which, I think, should be convergent to some extent.) Of
course, the other side of the coin is economy. It would be a huge waste of
money and resources if everybody should compile corpora of their own - and
preferably non-stop!

sincerely
Sampo



More information about the Corpora mailing list