[Corpora-List] WSD / # WordNet senses / Mechanical Turk
Benjamin Van Durme
vandurme at cs.jhu.edu
Tue Jul 16 12:32:38 UTC 2013
Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap
and Fast - But is it Good? Evaluating Non-Expert Annotations for
Natural Language Tasks. EMNLP 2008.
http://ai.stanford.edu/~rion/papers/amt_emnlp08.pdf
"We collect 10 annotations for each of 177 examples of the noun
“president” for the three senses given in SemEval. [...]
performing simple majority voting (with random tie-breaking) over
annotators results in a rapid accuracy plateau at a very high rate of
0.994 accuracy. In fact, further analysis reveals that there was only
a single disagreement between the averaged non-expert vote and the
gold standard; on inspection it was observed that the annotators voted
strongly against the original gold la-bel (9-to-1 against), and that
it was in fact found to be an error in the original gold standard
annotation.6 After correcting this error, the non-expert accuracy rate
is 100% on the 177 examples in this task. This is a specific example
where non-expert annotations can be used to correct expert
annotations. "
Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations
of Word Sense in Parallel Corpora. NAACL Short. 2012.
http://cs.jhu.edu/~vandurme/papers/YaoVanDurmeCallison-BurchNAACL12.pdf
"2 Turker Reliability
While Amazon’s Mechanical Turk (MTurk) has been been considered in the
past for constructing lexical semantic resources (e.g., (Snow et al.,
2008; Akkaya et al., 2010; Parent and Eskenazi, 2010; Rumshisky,
2011)), word sense annotation is sensi- tive to subjectivity and
usually achieves low agree- ment rate even among experts. Thus we
first asked Turkers to re-annotate a sample of existing gold- standard
data. With an eye towards costs saving, we also considered how many
Turkers would be needed per item to produce results of sufficient
quality.
Turkers were presented sentences from the test portion of the word
sense induction task of SemEval-2007 (Agirre and Soroa, 2007),
covering 2,559 instances of 35 nouns, expert-annotated with OntoNotes
(Hovy et al., 2006) senses. [...]
We measure inter-coder agreement using Krip- pendorff’s Alpha
(Krippendorff, 2004; Artstein and Poesio, 2008), [...]"
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list