[Corpora-List] What is corpora and what is not?
Angus Grieve-Smith
grvsmth at panix.com
Mon Oct 8 00:18:33 UTC 2012
On 10/7/2012 10:41 AM, Graham White wrote:
> i) People are trying to find a definition of a term, whereas a good deal
> of experience in disciplines such as corpus linguistics seems to show
> that a lot of terms do not have definitions in the sense that people
> are looking for (and that they are none the worse for that). Why do
> people not apply the insights of their own discipline to this enterprise?
Hear, hear!
"Corpus" is clearly polysemous, as Ken reported from the OED. In
some situations it means "a collection of texts" (i.e. a collection that
can be as small as a single text), in other situations it means "a
collection of texts used for linguistic analysis," in other cases, "a
collection of machine-readable texts," and even "a *good* collection of
texts."
I'm one of the people on this list who push the hardest and most
obnoxiously for representative corpora, but I really don't think that
representativeness should be part of the definition. We already have a
perfectly good phrase, "representative corpus."
It's a common gatekeeping tactic to restrict a definition to only
the members you like - and I've documented it in "transgender" and
"linguistics" among many other places. Tolstoy famously found it used
to great effect in "art."
Category gatekeeping is a tempting tactic, and I can understand why
some people would be tempted to use it. But it's a fundamentally
dishonest tactic, and one that people who value science should avoid.
No offense to the people on the list who did, but I do disagree with
your choice and I hope you will change your mind.
And of course, every corpus is representative of something - itself
at a minimum. Many are not in fact representative of much else. The
ARTFL corpus (on which I have spent many hours, and on which I have
based a major study) is representative of the canon of French classics
that were in the public domain as of 1968 or so. It's not necessarily
possible to generalize it to anything else. But it's still a corpus, and
so is any other collection of texts.
> ii) Apart from saying that a corpus is a collection of texts (which, as
> everyone remarks, is not very illuminating), then it seems to come
> down to saying something about the purposes for which people collect
> and use corpora. But these purposes are extremely varied, corpora can
> be collected for one purpose and used for another, and so on. Corpus
> linguistics is, after all, an applied discipline, and the purposes
> very often come from the discipline applying the methods of corpus
> linguistics: it's very hard to survey the purposes of those disciplines,
> and not for us to stipulate them.
Yes. Definitions don't have to be illuminating. A "pack" is a
collection of dogs (or cards, or cigarettes, etc.). Of course there's
more to the story than that, but the definition doesn't have to tell the
story.
--
-Angus B. Grieve-Smith
grvsmth at panix.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121007/2c8b89b5/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list