[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Adam Kilgarriff adam at lexmasterclass.com
Tue Jul 16 06:43:58 UTC 2013


Dear Mark, John,

Let me confess to a moment of embarrassment that I've been anxious about
for years: following SENSEVAL-1 I did a (tiny) experiment to establish
inter-annotator agreement, and came up with the 95% figure cited by John.

On experience since, I think the findings were not sound, and it is most
unusual to get a figure that high, and I regret having published it (and,
worse, having put it in the title of a short paper from EACL-99)

For either automatic WSD, or even for the gold standard, I agree entirely
with John:

Miss Elliott, my high-school English teacher, wouldn't give
> anyone a gold star [for work like that]


Adam

On 16 July 2013 01:59, John F Sowa <sowa at bestweb.net> wrote:

> On 7/15/2013 6:15 PM, Kilian Evang wrote:
>
>> Off the top of my head, here's two relevant studies on inter-rater
>> reliability for WSD, one for the case of expert annotators and one for
>> the case of non-experts:
>>
>> http://link.springer.com/**article/10.1023/A:**1002693207386#page-1<http://link.springer.com/article/10.1023/A:1002693207386#page-1>
>>
>
> From the abstract at the pointy end of this pointer:
>
>> The exercise identifies the state-of-the-art for fine-grained word sense
>> disambiguation, where training data is available, as 74–78% correct, with
>> a number of algorithms approaching this level of performance. For systems
>> that did not assume the availability of training data, performance was
>> markedly lower and also more variable. Human inter-tagger agreement was
>> high, with the gold standard taggings being around 95% replicable.
>>
>
> Implication:  For a 300-word page of text, a state-of-the-art program
> would have about 75 errors.  That would be an average of two errors
> for 8-word sentences, or five errors for 20-word sentences.
>
> For the "gold" standard, there would still be 15 errors in a 300-word
> page.  Miss Elliott, my high-school English teacher, wouldn't give
> anyone a gold star for 15 errors per page.
>
> John
>
>
> ______________________________**_________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130716/2c74d223/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list