Corpora: What is a corpus

Mike Scott lexical at netcomuk.co.uk
Fri Jan 28 10:25:23 UTC 2000


Lucian Galescu wrote:

>It strikes me as ironic that corpus linguists would want to prescribe
>the usage of the word "corpus". Using Oliver's terminology, I would say
>that all corpora are `filtered'. choosing 13th century texts, or
>Shakespeare's plays, or conversations with a travel agent, or the Bible,
>etc, etc., all are ways of filtering the abstract body of language
>around us for a specific purpose, since they all involve a criterion of
>what is in and what is out of the corpus.

I agree with Lucian. If like me you have a text focus in your work, you
will probably wish to collect a corpus of complete texts, that is language
events which can be felt to be self-standing. But I wouldn't want to
restrict the term to that, since most folk do not seem to have a text focus
but rather a language focus. By that I mean that most folks make claims
about a language, not about a givenm text or set of texts. It therefore
seems reasonable for some purposes to collect past tense sentences, and if
the collection is electronic and esteemed by its creator s/he will be
likely to prefer "corpus" as a label rather than "list".

Mike Scott, Liverpool



More information about the Corpora mailing list