Corpora: Collaborative effort
Bob Krovetz
krovetz at research.nj.nec.com
Tue Jun 13 01:01:03 UTC 2000
Jeremy Clear wrote:
>... That's the crucial thing -- you spend no significant
>time agonizing over the task; you just quickly pick some concordance
>lines and send them in. Sure, not everyone will agree 100% that the
>lines you've picked exactly match the sense I posted (first because
>the sense I posted was just an arbitrary definition taken from one
>dictionary which is clearly inadequate to define and delimit precisely
>a semantic range; and second, because no-one is going to validate or
Philip Resnik wrote:
>I agree -- especially since tolerance of noise is necessary even when
>working with purportedly "quality controlled" data. And one can
>always post-process to clean things up if quality becomes an issue
I don't mean to put a damper on this idea, but we should expect that
the agreement rate will be far from 100%. Also, the tolerance of noise
will depend on the amount of noise. I did a comparison between the
tagging of the Brown files in Semcor and the tagging done by DSO.
I found that the agreement rate was 56%. This is exactly the rate of
agreement we would find by chance. So the amount of post-processing
could be quite a bit of work!
Bob
krovetz at research.nj.nec.com
More information about the Corpora
mailing list