Dear Mark, John,<div><br></div><div>Let me confess to a moment of embarrassment that I've been anxious about for years: following SENSEVAL-1 I did a (tiny) experiment to establish inter-annotator agreement, and came up with the 95% figure cited by John. </div>
<div><br></div><div>On experience since, I think the findings were not sound, and it is most unusual to get a figure that high, and I regret having published it (and, worse, having put it in the title of a short paper from EACL-99)</div>
<div><br></div><div>For either automatic WSD, or even for the gold standard, I agree entirely with John: </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Miss Elliott, my high-school English teacher, wouldn't give<br>anyone a gold star [for work like that]</blockquote><div><br></div><div>Adam <br><br><div class="gmail_quote">On 16 July 2013 01:59, John F Sowa <span dir="ltr"><<a href="mailto:sowa@bestweb.net" target="_blank">sowa@bestweb.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On 7/15/2013 6:15 PM, Kilian Evang wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Off the top of my head, here's two relevant studies on inter-rater<br>
reliability for WSD, one for the case of expert annotators and one for<br>
the case of non-experts:<br>
<br>
<a href="http://link.springer.com/article/10.1023/A:1002693207386#page-1" target="_blank">http://link.springer.com/<u></u>article/10.1023/A:<u></u>1002693207386#page-1</a><br>
</blockquote>
<br></div>
>>From the abstract at the pointy end of this pointer:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The exercise identifies the state-of-the-art for fine-grained word sense<br>
disambiguation, where training data is available, as 74–78% correct, with<br>
a number of algorithms approaching this level of performance. For systems<br>
that did not assume the availability of training data, performance was<br>
markedly lower and also more variable. Human inter-tagger agreement was<br>
high, with the gold standard taggings being around 95% replicable.<br>
</blockquote>
<br>
Implication: For a 300-word page of text, a state-of-the-art program<br>
would have about 75 errors. That would be an average of two errors<br>
for 8-word sentences, or five errors for 20-word sentences.<br>
<br>
For the "gold" standard, there would still be 15 errors in a 300-word<br>
page. Miss Elliott, my high-school English teacher, wouldn't give<br>
anyone a gold star for 15 errors per page.<span><font color="#888888"><br>
<br>
John</font></span><div><div><br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>========================================<br><a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a> <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a> <br>
Director <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a> <br>Visiting Research Fellow <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a> <div>
<i><font color="#006600">Corpora for all</font></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a> </div><div> <i><a href="http://www.webdante.com" target="_blank">DANTE: <font color="#009900">a lexical database for English</font></a><font color="#009900"> </font> </i><div>
========================================</div></div>
</div>