Corpora: when does a subcorpus become a corpus

Sampo Nevalainen samponev at cc.joensuu.fi
Fri Jan 4 09:58:50 UTC 2002


Well I guess I tried to focus on the issue of representativeness rather
than the proper nomination for the set of texts, but, yes, probably the
proper term might be 'special purpose corpus'. This, however, raises
another interesting question. I personally would hope that every single
corpus had been compiled for a particular purpose. Indeed, I wonder if
there really IS such thing as a 'general corpus'? I have a feeling that so
called 'general corpora' - if they exist - are pretty useless in general,
unless they're modified for a particular purpose or task. I suppose that in
empirical research you always have to choose your "object" (material)
according to your subject, and not to use "just something", i.e. you have
to know your material: I guess no one would try to determine the average
height of human beings on the basis of a basketball team. The problem with
language is that exceptions are often not evident and not easily detected
since there is no clear "reference set" for language. In principle, if your
findings are truly generalizable you should get similar results from any
corpus, although there is obviously more "noise" in more "general" corpora.
Am I right? Or am I pedant? Or both. ( About the "Terms in Context" - which
I do have read more than up to p. 45 :-) -, I liked the book, and I think I
could make use of some chapters in my course on corpora as translation tools. )

sincerely,
Sampo

At 09:54 4.1.2002 +0100, Pearson, Jennifer wrote:
>If you look at the same publication, p.48, you will find that I argue that,
>given Sinclair's definitions, neither the term subcorpus nor the term
>component is appropriate for the sets of texts I was working with (and
>probably not for the EAP texts referred to in previous e-mails either). I
>chose therefore to use the term special purpose corpus, "a corpus whose
>composition is determined by the precise purpose for which it is to be used.
>While a special purpose corpus may be derived from a general reference
>corpus or from a monitor corpus it will not constitute a subcorpus in the
>sense defined by Sinclair because it will not have all of the properties of
>a larger corpus." I coined this particular term for two reasons, a) because
>the language of the texts I was working with could be classified as
>'language for special purposes' or 'LSP', two terms that already existed in
>applied linguistics to designate, for example, the language of business, the
>language of medicine, the language of economics, and b) because the term
>'special purpose corpus' implies that the corpus has been compiled for a
>particular purpose.
>Wishing you all a happy new year
>Jennifer
>
>Dr Jennifer Pearson
>Chief of Translation
>UNESCO
>7 Place de Fontenoy
>75352 Paris 07
>Tel:. 00 33 1 456 80 780
>e-mail: j.pearson at unesco.org
>http://www.unesco.org



More information about the Corpora mailing list