<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 10/7/2012 10:41 AM, Graham White
wrote:<br>
</div>
<blockquote cite="mid:50719498.5080306@eecs.qmul.ac.uk" type="cite">i)
People are trying to find a definition of a term, whereas a good
deal
<br>
of experience in disciplines such as corpus linguistics seems to
show that a lot of terms do not have definitions in the sense that
people are looking for (and that they are none the worse for
that). Why do people not apply the insights of their own
discipline to this enterprise?
<br>
</blockquote>
<br>
Hear, hear!<br>
<br>
"Corpus" is clearly polysemous, as Ken reported from the OED.
In some situations it means "a collection of texts" (i.e. a
collection that can be as small as a single text), in other
situations it means "a collection of texts used for linguistic
analysis," in other cases, "a collection of machine-readable texts,"
and even "a <b>good</b> collection of texts."<br>
<br>
I'm one of the people on this list who push the hardest and most
obnoxiously for representative corpora, but I really don't think
that representativeness should be part of the definition. We
already have a perfectly good phrase, "representative corpus."<br>
<br>
It's a common gatekeeping tactic to restrict a definition to
only the members you like - and I've documented it in "transgender"
and "linguistics" among many other places. Tolstoy famously found
it used to great effect in "art."<br>
<br>
Category gatekeeping is a tempting tactic, and I can understand
why some people would be tempted to use it. But it's a
fundamentally dishonest tactic, and one that people who value
science should avoid. No offense to the people on the list who did,
but I do disagree with your choice and I hope you will change your
mind.<br>
<br>
And of course, every corpus is representative of something -
itself at a minimum. Many are not in fact representative of much
else. The ARTFL corpus (on which I have spent many hours, and on
which I have based a major study) is representative of the canon of
French classics that were in the public domain as of 1968 or so.
It's not necessarily possible to generalize it to anything else.
But it's still a corpus, and so is any other collection of texts.<br>
<br>
<blockquote cite="mid:50719498.5080306@eecs.qmul.ac.uk" type="cite">
ii) Apart from saying that a corpus is a collection of texts
(which, as
<br>
everyone remarks, is not very illuminating), then it seems to come
down to saying something about the purposes for which people
collect and use corpora. But these purposes are extremely varied,
corpora can be collected for one purpose and used for another, and
so on. Corpus linguistics is, after all, an applied discipline,
and the purposes very often come from the discipline applying the
methods of corpus linguistics: it's very hard to survey the
purposes of those disciplines,
<br>
and not for us to stipulate them.
<br>
</blockquote>
Yes. Definitions don't have to be illuminating. A "pack" is a
collection of dogs (or cards, or cigarettes, etc.). Of course
there's more to the story than that, but the definition doesn't have
to tell the story.<br>
<br>
<pre class="moz-signature" cols="72">--
-Angus B. Grieve-Smith
<a class="moz-txt-link-abbreviated" href="mailto:grvsmth@panix.com">grvsmth@panix.com</a>
</pre>
</body>
</html>