Corpora: when does a subcorpus become a corpus
P bI K O B_ B.B.
rykov at narod.ru
Fri Jan 4 13:05:44 UTC 2002
I would like to make a little example. There was a report here about distribution of the meanings of the verb "moch'". This Russian verb has two main meanings - "can" and "may".
Would my distributions based on the corpora like Corpus of Russian Proverbs, Political metaphors or Russian newspapers have any value - or - in other words - tell us smth about Russian language as a whole? I think that the proof of it can give the texts of general carefully compiled balanced represantative corpus of Russian language.
>Well I guess I tried to focus on the issue of representativeness rather
>than the proper nomination for the set of texts, but, yes, probably the
>proper term might be 'special purpose corpus'. This, however, raises
>another interesting question. I personally would hope that every single
>corpus had been compiled for a particular purpose. Indeed, I wonder if
>there really IS such thing as a 'general corpus'? I have a feeling that so
>called 'general corpora' - if they exist - are pretty useless in general,
>unless they're modified for a particular purpose or task. I suppose that in
>empirical research you always have to choose your "object" (material)
>according to your subject, and not to use "just something", i.e. you have
>to know your material: I guess no one would try to determine the average
>height of human beings on the basis of a basketball team. The problem with
>language is that exceptions are often not evident and not easily detected
>since there is no clear "reference set" for language. In principle, if your
>findings are truly generalizable you should get similar results from any
>corpus, although there is obviously more "noise" in more "general" corpora.
>Am I right? Or am I pedant? Or both. ( About the "Terms in Context" - which
>I do have read more than up to p. 45 :-) -, I liked the book, and I think I
>could make use of some chapters in my course on corpora as translation tools. )
>
>sincerely,
>Sampo
>
>At 09:54 4.1.2002 +0100, Pearson, Jennifer wrote:
>>If you look at the same publication, p.48, you will find that I argue that,
>>given Sinclair's definitions, neither the term subcorpus nor the term
>>component is appropriate for the sets of texts I was working with (and
>>probably not for the EAP texts referred to in previous e-mails either). I
>>chose therefore to use the term special purpose corpus, "a corpus whose
>>composition is determined by the precise purpose for which it is to be used.
>>While a special purpose corpus may be derived from a general reference
>>corpus or from a monitor corpus it will not constitute a subcorpus in the
>>sense defined by Sinclair because it will not have all of the properties of
>>a larger corpus." I coined this particular term for two reasons, a) because
>>the language of the texts I was working with could be classified as
>>'language for special purposes' or 'LSP', two terms that already existed in
>>applied linguistics to designate, for example, the language of business, the
>>language of medicine, the language of economics, and b) because the term
>>'special purpose corpus' implies that the corpus has been compiled for a
>>particular purpose.
>>Wishing you all a happy new year
>>Jennifer
>>
>>Dr Jennifer Pearson
>>Chief of Translation
>>UNESCO
>>7 Place de Fontenoy
>>75352 Paris 07
>>Tel:. 00 33 1 456 80 780
>>e-mail: j.pearson at unesco.org
>>http://www.unesco.org
>
>
>
>
--
Vladimir Rykov, PhD in Comp Linguistics,
MOSCOW
http://rykov.narod.ru/
Engl. http://www.blkbox.com/~gigawatt/rykov.html
Tel +7-903-749-19-99
More information about the Corpora
mailing list