Corpora: Collaborative effort

Jem Clear jem at cobuild.collins.co.uk
Sun Jun 11 10:08:42 UTC 2000


George

Thanks for your comprehensive comments and suggestions. Here are
some
rejoinders:

a) Of course the free, unrestricted distribution of the resulting
collection of citations grouped by word/sense category is ESSENTIAL.
That's the key point of the idea. Even well-meaning research projects
carried out with public funding often yield results which it is
difficult to obtain in full becuase of the commercial sensitivities
of one or two of the commercial participants in the project
consortium and even (dare I say) because many universities these
days see themselves in competition with other research and learning
institutions and are sometime reluctant to give away data like this.


b)
> When you post a word, list ALL of its senses and indicate which one
> you want to get...  In fact, it does not seem so useful to get just
> one sense.  Why not give the word and all of its senses? Let the
> participants sort the examples into sense1, sense2 etc.

Yes. But I really think the collaborative nature of the idea will
*only* work if the amount of effort required by any individual is
minimal. Once you present, say, a word like "run" and offer 26
different definitions in one block and ask people to submit citations
for all 26 categories then you are really asking people to commit a
significant amount of work to analyse the subtle variations in sense
distinction, and sort through potentially hundreds of thousands of
examples to pick out instances of each sense. My idea was that if you
see a word + defintion pair you can (without thinking too hard about
it) pick from a corpus a few examples which seem, prima facie, to fit
the selected sense. We can worry about the fine distinctions, and
overlapping sense categories later!

c)
> One of current
> interest to me (Hint: Please use this one :-) ) is "today".
> Today1 = (N) The day of the utterance. Today is June 9.
> Today2 = (N) The current time period. Today's man is always busy.
> Today3 = (ADV) happening on the current day. I went to the store today.
> Today4 = (ADV) happening in the current period, nowadays.
>                Today, we use computers to communicate.

If this idea were to work we cannot spend time and effort arguing over
the sense categories themselves. I personally think that in your
example above "today1" and "today3" are identical in meaning -- but
that's no problem for this collaborative venture as long as we don't
expect to get a database which covers *all* the sense distinctions
everyone would like to make.

Cheers

Jem

PS Thanks for the examples for "fierce"



More information about the Corpora mailing list