[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Andy Schwartz andy.schwartz at gmail.com
Tue Jul 16 21:25:47 UTC 2013


Recently collaborated with a crowdsource researcher to answer a
similarquestion among others. We found, over
a large sample of words, every additional (coarse-grained) sense resulted
in approximately 3% less accuracy of turkers (accuracy is turkers agreement
with expert annotations of a SemEval2007 task).

Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar and Dean
Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the
Crowd. *In COLing-2012:*
pdf <http://www.seas.upenn.edu/%7Ehansens/COLing2012-poster-kapelner.pdf>

a few other findings, from the abstract:

(a) the number of rephrasings within a sense definition is associated with
higher accuracy;
(b) as word frequency increases, accuracy decreases even if the number of
senses is kept constant; and
(c) spending more time is associated with a decrease in accuracy.

Best,

-Andy


Sorry if this is a basic question for computational linguists; I'm a corpus
> linguist.
>
> I'm wondering if there has been much research on inter-rater reliability
> of word sense disambiguation by raters on something like Mechanical Turk.
> For example:
>
> -- Given some verbs that have 5 word senses each in WordNet (e.g. the
> words tag, tame, taste, temper), how well do native speakers agree on the
> word sense for these verbs in context -- How does this inter-rater
> reliability change for words that might have just two senses (e.g. the
> verbs taint, tamper, tan, tank) or maybe 10 senses (e.g. the verbs shift,
> spread, stop, trim). (In other words, intuition suggests that for words
> with two WordNet senses, there might be higher inter-rater reliability than
> those words with five senses, and that for words with 10 WN senses,
> inter-rate reliability would be pretty bad.) -- Semantically, which kinds
> of 2 / 5 / 10 WN entry words have the best inter-rater reliability, and
> which have the worst?
>
> Thanks in advance.
>
> Mark Davies
>
> ============================================ Mark Davies Professor of
> Linguistics / Brigham Young University http://davies-linguistics.byu.edu/
>
> ** Corpus design and use // Linguistic databases ** ** Historical
> linguistics // Language variation ** ** English, Spanish, and Portuguese **
> ============================================
>

-- 
H. Andrew Schwartz <http://www.seas.upenn.edu/%7Ehansens/>
Postdoctoral Fellow
Computer & Info. Science /
Lead Research Scientist
WWBP <http://wwbp.org>, Pos. Psychol. Center
University of Pennsylvania
215-746-5085
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130716/d0f93acf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list