[Corpora-List] What is corpora and what is not?

Angus Grieve-Smith grvsmth at panix.com
Mon Oct 8 00:18:33 UTC 2012


On 10/7/2012 10:41 AM, Graham White wrote:
> i) People are trying to find a definition of a term, whereas a good deal
> of experience in disciplines such as corpus linguistics seems to show 
> that a lot of terms do not have definitions in the sense that people 
> are looking for (and that they are none the worse for that). Why do 
> people not apply the insights of their own discipline to this enterprise?

     Hear, hear!

     "Corpus" is clearly polysemous, as Ken reported from the OED. In 
some situations it means "a collection of texts" (i.e. a collection that 
can be as small as a single text), in other situations it means "a 
collection of texts used for linguistic analysis," in other cases, "a 
collection of machine-readable texts," and even "a *good* collection of 
texts."

     I'm one of the people on this list who push the hardest and most 
obnoxiously for representative corpora, but I really don't think that 
representativeness should be part of the definition.  We already have a 
perfectly good phrase, "representative corpus."

     It's a common gatekeeping tactic to restrict a definition to only 
the members you like - and I've documented it in "transgender" and 
"linguistics" among many other places.  Tolstoy famously found it used 
to great effect in "art."

     Category gatekeeping is a tempting tactic, and I can understand why 
some people would be tempted to use it.  But it's a fundamentally 
dishonest tactic, and one that people who value science should avoid.  
No offense to the people on the list who did, but I do disagree with 
your choice and I hope you will change your mind.

     And of course, every corpus is representative of something - itself 
at a minimum.  Many are not in fact representative of much else.  The 
ARTFL corpus (on which I have spent many hours, and on which I have 
based a major study) is representative of the canon of French classics 
that were in the public domain as of 1968 or so. It's not necessarily 
possible to generalize it to anything else. But it's still a corpus, and 
so is any other collection of texts.

> ii) Apart from saying that a corpus is a collection of texts (which, as
> everyone remarks, is not very illuminating), then it seems to come 
> down to saying something about the purposes for which people collect 
> and use corpora. But these purposes are extremely varied, corpora can 
> be collected for one purpose and used for another, and so on. Corpus 
> linguistics is, after all, an applied discipline, and the purposes 
> very often come from the discipline applying the methods of corpus 
> linguistics: it's very hard to survey the purposes of those disciplines,
> and not for us to stipulate them.
     Yes.  Definitions don't have to be illuminating.  A "pack" is a 
collection of dogs (or cards, or cigarettes, etc.).  Of course there's 
more to the story than that, but the definition doesn't have to tell the 
story.

-- 
				-Angus B. Grieve-Smith
				grvsmth at panix.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121007/2c8b89b5/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list