Corpora: What is a corpus

Oliver Mason oliver at clg.bham.ac.uk
Fri Jan 28 12:14:18 UTC 2000


On Fri, Jan 28, 2000 at 12:06:09PM +0100, Sabine Bartsch, FB02 SprachLit wrote:

> Would Oliver agree that 'filtered' in his definition of a
> corpus is (near)synonymous with 'analysed'?

Probably.  The main point I wanted to make was that I understand a
corpus to be a lump of real language, not extracts of the same.  So you
could have a corpus of almost anything that is a text type or genre,
but it wouldn't be a corpus any more once you meddle with it, by eg
extracting all proverbs, noun phrases or whatnot.  The result of that
would be a list of all proverbs or noun phrases occurring in a
particular corpus.

By what I rather unprecisely called `filtering' I meant this extraction
of elements from a corpus, not the creation of a corpus from the
infinite amount of language data by selecting a sample of it.  But how
do you collect a `corpus of past tense sentences'?  Where do you find
them in the real world?  You can find the plays of Shakespeare as an
entity which, of course, is a sample of language, and could conceivably
be treated as a corpus of early modern English drama.

Oliver

--
//\\ computer officer | corpus research | department of english | school of  -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
\\// mobile 07050 104504 | http://www.clg.bham.ac.uk | o.mason at bham.ac.uk\/  -



More information about the Corpora mailing list