Corpora: Corpus Linguistics

ramesh at clg.bham.ac.uk ramesh at clg.bham.ac.uk
Sun Apr 29 22:18:16 UTC 2001


James L. Fidelholtz wrote:
=09Hmmm.  Maybe I'm not cut out to be a 'real' corpus linguist, if
this is true, since my principal interest is in relatively 'rare'
phenomena.

Ramesh writes:
As a large corpus would seem to be the best empirical evidence
we have at our disposal, only a `real' corpus linguist would be
able to tell you what is a `rare' phenomenon and what isn't....
The reason for focussing on non-rare phenomena is that one
can be more certain that we are looking at language features
that obtain throughout many varied idolects, text-types, modes,
genres, contexts, etc
The problem with rare phenomena is that one cannot be certain that
one of those factors (e.g. idiolect, typographic error, highly
constrained context) is not the sole explanation for it, and
therefore it is less generalizable, and must be consigned to
the general rag-bag category at the bottom of every frequency list,
of items on which one has to suspend judgement until more data
confirms it to be a one-off, or shows it to have been the tip of
the iceberg of a hitherto unnoticed phenomenon. It may also
be the harbinger of language change, as a synchronic corpus
becomes a diachronic one, as data is collected over a longer period of
time.

Best
Ramesh Krishnamurthy
Birmingham



More information about the Corpora mailing list