Corpora: What is a corpus
Oliver Mason
oliver at clg.bham.ac.uk
Fri Jan 28 12:14:18 UTC 2000
On Fri, Jan 28, 2000 at 12:06:09PM +0100, Sabine Bartsch, FB02 SprachLit wrote:
> Would Oliver agree that 'filtered' in his definition of a
> corpus is (near)synonymous with 'analysed'?
Probably. The main point I wanted to make was that I understand a
corpus to be a lump of real language, not extracts of the same. So you
could have a corpus of almost anything that is a text type or genre,
but it wouldn't be a corpus any more once you meddle with it, by eg
extracting all proverbs, noun phrases or whatnot. The result of that
would be a list of all proverbs or noun phrases occurring in a
particular corpus.
By what I rather unprecisely called `filtering' I meant this extraction
of elements from a corpus, not the creation of a corpus from the
infinite amount of language data by selecting a sample of it. But how
do you collect a `corpus of past tense sentences'? Where do you find
them in the real world? You can find the plays of Shakespeare as an
entity which, of course, is a sample of language, and could conceivably
be treated as a corpus of early modern English drama.
Oliver
--
//\\ computer officer | corpus research | department of english | school of -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\ -
\\// mobile 07050 104504 | http://www.clg.bham.ac.uk | o.mason at bham.ac.uk\/ -
More information about the Corpora
mailing list