Corpora: Collaborative venture

Tue Jun 13 15:13:27 UTC 2000

Dear all,

I have been following the discussion which sprang from this
idea. I do think quite a few of the issues raised there
were interesting and far from resolved, but as Jem said,
it's not the point of the original message.

> I am not so foolish as to believe
>
> a) that all respondents would select the same citations if offered the
> same source set (this is the Consensus Issue)
>
> b) that the dictionary definition is "true" or "correct" or clearly defines
> the boundaries of a word sense (this is the Which Tagset? Issue)
>
> c) that all citations selected by respondents would be "correct" (this
> is the Quality Control Issue: aka the Noise Problem)

About the noise problem, it is not really a problem, as it
shows if anything that "sense" or "meaning" or whatever you
want to call it is fuzzy and its borders even more so. And
the number of "really incorrect" citations would probably
be quite low anyway (let us linguists trust ourselves :-)

Now I have a suggestion, which may or may not be worth
looking at. Suppose instead of just one definition, you
give all the definitions you find in one dictionary. Then
people could look at their corpora or whatever their
sources, and instead of looking for instances of just one
meaning, look for those that match one definition. It might
save some time, as they dont have to go back to the
concordance once it's been done.
Also, do you think it would be useful to gather the
instances of meanings which are not defined or do not seem
to fit in any definitions in the dictionary? If the
database which you propose to create has that on top of the
instances which do match the definitions, it could help
future lexicographers in finding out new meanings for the
words they study.
But maybe that's pushing it too far.

Best,

Antoine

----------------------
Antoine Consigny
anconsig at liverpool.ac.uk