Corpora: Collaborative effort

Jem Clear jem at cobuild.collins.co.uk
Sun Jun 11 10:18:05 UTC 2000


> Brilliant idea. But, who does all the processing involved, or do the
> contributors make these decisions? Attached is a list of 1500+
> concordanced lines fromn about 100 million words I've sucked off the
> web.  Probably useless unless someone has the time to decide how and
> what to include.

John

The key part of the idea is that anyone can, without too much effort
or time, pick some (two or three or twenty or whatever) examples from
a corpus they have to hand whicj **they think** match the posted
sense. So the 1500 examples you sent me aren't much use, I'm afraid,
becuase you have simply concordanced "fierce" from a large
corpus. What I hoped was that you would pick a few of those
concordance lines which you think match the sense of "fierce" I
posted. Then, through minimal effort on your part, plus the minimal
effort of some tens of other willing participants, we would all have a
set of maybe a hundred concordance lines showing "fierce" used in the
posted sense. That's the crucial thing -- you spend no significant
time agonizing over the task; you just quickly pick some concordanc
elines and send them in. Sure, not everyone will agree 100% that the
lines you've picked exactly match the sense I posted (first because
the sense I posted was just an arbitrary definition taken from one
dictionary which is clearly inadequate to define and delimit precisely
a semantic range; and second, because no-one is going to validate or
check your examples to make sure they do indeed exemplify the required
sense). The point is there may be fuzzy edges, but in the main the
examples collected will be a valuable dataset.

Thanks for your contribution

Jem



More information about the Corpora mailing list