Corpora: What is a corpus

ramesh at clg.bham.ac.uk ramesh at clg.bham.ac.uk
Fri Jan 28 23:50:29 UTC 2000


A belated comment:
Following on from Lou's comments, which I generally agree with,
isn't the point that one can apply filters at the *text* level,
i.e. one can say `this corpus contains only 19th century English
texts' or `this corpus contains only French Symbolist poetry', and
then one has to explain why certain texts were included and others excluded
from the corpus (i.e. what one means by `19th century English' or
`French Symbolist poetry'), but as long as the purpose of a corpus is
to study the linguistic features of a collection of texts, one cannot
apply any constraints to the linguistic features themselves. Proverbs,
like `past tense', or `imperatives' are a) in the eye of the beholder
(is the use of historic present an example of `past tense' or not?)
b) to be studied as they occur in authentic texts. A corpus is like
a natural landscape in which one might look at the distribution of
dandelions. A collection of proverbs is like a collection of dandelions
in a botanist's laboratory. So a dictionary of proverbs would be the
ideal resource for the original emailer, if he/she is not interested in
the contexts and texts in which they occur, the purposes for which they are
used, who uses them and when, etc. But of course, using a dictionary means
that one is bound by the decisions made by someone else about what a
proverb is, and which proverbs are worth including in the dictionary, etc
Ramesh

Ramesh Krishnamurthy
Corpus Research Group
University of Birmingham



More information about the Corpora mailing list