<div><font face="VERDANA">Recently <font>c</font>ollaborated with a crowdsource researcher <font>to answer <font>a similar</font> question among others. We found, o<font>ver a large sample of words,</font> every additional (coarse-<font>grained) </font>sense result<font>ed in approximately 3% less </font></font>accuracy of<font> </font>turkers (accuracy is turkers agreement with <font>expert annotations of a SemEval2007 <font>task</font>)</font>.<br>

Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar and Dean Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the Crowd.

<i>In COLing-2012:</i></font><br><font color="#333333" face="VERDANA"><a href="http://www.seas.upenn.edu/%7Ehansens/COLing2012-poster-kapelner.pdf">pdf</a></font> <br><br>a few other findings, from the abstract: <br><br>

(a) the number of rephrasings within a sense definition is associated with higher accuracy; <br>(b) as word frequency increases, accuracy decreases even if the number of senses is kept constant; and<br>(c) spending more time is associated with a decrease in accuracy.<br>

Best,  -Andy </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">Sorry if this is a basic question for computational linguists; I'm a corpus linguist.

<p>

I'm wondering if there has been much research on inter-rater reliability

 of word sense disambiguation by raters on something like Mechanical 

Turk. For example:

</p><p>

-- Given some verbs that have 5 word senses each in WordNet (e.g. the 

words tag, tame, taste, temper), how well do native speakers agree on 

the word sense for these verbs in context

-- How does this inter-rater reliability change for words that might 

have just two senses (e.g. the verbs taint, tamper, tan, tank) or maybe 

10 senses (e.g. the verbs shift, spread, stop, trim).

(In other words, intuition suggests that for words with two WordNet 

senses, there might be higher inter-rater reliability than those words 

with five senses, and that for words with 10 WN senses, inter-rate 

reliability would be pretty bad.)

-- Semantically, which kinds of 2 / 5 / 10 WN entry words have the best 

inter-rater reliability, and which have the worst?

</p><p>

Thanks in advance.

</p><p>

Mark Davies

</p><p>

============================================

Mark Davies

Professor of Linguistics / Brigham Young University

<a href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a>

</p><p>

** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **

============================================</p></blockquote><div><br>-- <br><a href="http://www.seas.upenn.edu/%7Ehansens/" target="_blank"><span style="font-family:arial,helvetica,sans-serif"></span><span style="font-family:arial,helvetica,sans-serif">H. Andrew Schwartz</span></a><br style="font-family:arial,helvetica,sans-serif">

<span style="font-family:arial,helvetica,sans-serif">Postdoctoral Fellow</span><br style="font-family:arial,helvetica,sans-serif">

<span style="font-family:arial,helvetica,sans-serif">Computer & Info. Science /</span><br style="font-family:arial,helvetica,sans-serif">

<span style="font-family:arial,helvetica,sans-serif">Lead Research Scientist<br><a href="http://wwbp.org" target="_blank">WWBP</a></span>, Pos. Psychol. Center<br style="font-family:arial,helvetica,sans-serif">

<span style="font-family:arial,helvetica,sans-serif">University of Pennsylvania<br>215-746-5085</span> <br></div>