<div><font face="VERDANA">Recently <font>c</font>ollaborated with a crowdsource researcher <font>to answer <font>a similar</font> question among others. We found, o<font>ver a large sample of words,</font> every additional (coarse-<font>grained) </font>sense result<font>ed in approximately 3% less </font></font>accuracy of<font> </font>turkers (accuracy is turkers agreement with <font>expert annotations of a SemEval2007 <font>task</font>)</font>.<br>
<br>Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar and Dean Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the Crowd.
<i>In COLing-2012:</i></font><br><font color="#333333" face="VERDANA"><a href="http://www.seas.upenn.edu/%7Ehansens/COLing2012-poster-kapelner.pdf">pdf</a></font> <br><br>a few other findings, from the abstract: <br><br>
(a) the number of rephrasings within a sense definition is associated with higher accuracy; <br>(b) as word frequency increases, accuracy decreases even if the number of senses is kept constant; and<br>(c) spending more time is associated with a decrease in accuracy.<br>
<br>Best, <br><br>-Andy<br><br><br></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">Sorry if this is a basic question for computational linguists; I'm a corpus linguist.
<p>
I'm wondering if there has been much research on inter-rater reliability
of word sense disambiguation by raters on something like Mechanical
Turk. For example:
</p><p>
-- Given some verbs that have 5 word senses each in WordNet (e.g. the
words tag, tame, taste, temper), how well do native speakers agree on
the word sense for these verbs in context
-- How does this inter-rater reliability change for words that might
have just two senses (e.g. the verbs taint, tamper, tan, tank) or maybe
10 senses (e.g. the verbs shift, spread, stop, trim).
(In other words, intuition suggests that for words with two WordNet
senses, there might be higher inter-rater reliability than those words
with five senses, and that for words with 10 WN senses, inter-rate
reliability would be pretty bad.)
-- Semantically, which kinds of 2 / 5 / 10 WN entry words have the best
inter-rater reliability, and which have the worst?
</p><p>
Thanks in advance.
</p><p>
Mark Davies
</p><p>
============================================
Mark Davies
Professor of Linguistics / Brigham Young University
<a href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a>
</p><p>
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================</p></blockquote><div><br>-- <br><a href="http://www.seas.upenn.edu/%7Ehansens/" target="_blank"><span style="font-family:arial,helvetica,sans-serif"></span><span style="font-family:arial,helvetica,sans-serif">H. Andrew Schwartz</span></a><br style="font-family:arial,helvetica,sans-serif">
<span style="font-family:arial,helvetica,sans-serif">Postdoctoral Fellow</span><br style="font-family:arial,helvetica,sans-serif">
<span style="font-family:arial,helvetica,sans-serif">Computer & Info. Science /</span><br style="font-family:arial,helvetica,sans-serif">
<span style="font-family:arial,helvetica,sans-serif">Lead Research Scientist<br><a href="http://wwbp.org" target="_blank">WWBP</a></span>, Pos. Psychol. Center<br style="font-family:arial,helvetica,sans-serif">
<span style="font-family:arial,helvetica,sans-serif">University of Pennsylvania<br>215-746-5085</span> <br></div>