[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Tue Jul 16 13:36:13 UTC 2013

Hi all,

May I ponder this (from Fort et al, 2011: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00057):
"in (Bhardwaj et al. 2010),[...] it is shown that, for their task of word sense disambiguation, a small number of trained annotators are superior to a larger number of untrained Turkers. On that point, their
results contradict that of (Snow et al. 2008), whose task was much simpler (the number
of senses per word was 3 for the latter, versus 9.5 for the former)"

Bhardwaj, Vikas, Rebecca Passonneau, Ansaf Salleb-Aouissi, and Nancy Ide.
2010. Anveshan: A tool for analysis of multiple annotators’ labeling behavior.
In Proceedings of The Fourth Linguistic Annotation Workshop (LAW IV)
pages 47–55, Uppsala.

Karën Fort
ATER ENSMN
Loria, équipe Sémagramme
Bureau C303
+33 (0)3 54 95 86 54
http://www.loria.fr/~fortkare/

----- Mail original -----
> De: "Benjamin Van Durme" <vandurme at cs.jhu.edu>
> À: corpora at uib.no
> Envoyé: Mardi 16 Juillet 2013 14:32:38
> Objet: Re: [Corpora-List] WSD / # WordNet senses / Mechanical Turk
> 
> Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap
> and Fast - But is it Good? Evaluating Non-Expert Annotations for
> Natural Language Tasks. EMNLP 2008.
> http://ai.stanford.edu/~rion/papers/amt_emnlp08.pdf
> 
> "We collect 10 annotations for each of 177 examples of the noun
> “president” for the three senses given in SemEval. [...]
> performing simple majority voting (with random tie-breaking) over
> annotators results in a rapid accuracy plateau at a very high rate of
> 0.994 accuracy.  In fact, further analysis reveals that there was
> only
> a single disagreement between the averaged non-expert vote and the
> gold standard; on inspection it was observed that the annotators
> voted
> strongly against the original gold la-bel (9-to-1 against), and that
> it was in fact found to be an error in the original gold standard
> annotation.6 After correcting this error, the non-expert accuracy
> rate
> is 100% on the 177 examples in this task. This is a specific example
> where non-expert annotations can be used to correct expert
> annotations. "
> 
> 
> 
> 
> 
> Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations
> of Word Sense in Parallel Corpora. NAACL Short. 2012.
> http://cs.jhu.edu/~vandurme/papers/YaoVanDurmeCallison-BurchNAACL12.pdf
> 
> 
> "2 Turker Reliability
> 
> While Amazon’s Mechanical Turk (MTurk) has been been considered in
> the
> past for constructing lexical semantic resources (e.g., (Snow et al.,
> 2008; Akkaya et al., 2010; Parent and Eskenazi, 2010; Rumshisky,
> 2011)), word sense annotation is sensi- tive to subjectivity and
> usually achieves low agree- ment rate even among experts. Thus we
> first asked Turkers to re-annotate a sample of existing gold-
> standard
> data. With an eye towards costs saving, we also considered how many
> Turkers would be needed per item to produce results of sufficient
> quality.
> 
> Turkers were presented sentences from the test portion of the word
> sense induction task of SemEval-2007 (Agirre and Soroa, 2007),
> covering 2,559 instances of 35 nouns, expert-annotated with OntoNotes
> (Hovy et al., 2006) senses.  [...]
> 
> We measure inter-coder agreement using Krip- pendorff’s Alpha
> (Krippendorff, 2004; Artstein and Poesio, 2008), [...]"
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora