[Corpora-List] Using MTurk for markup tasks (was Cost of part of speech tagging)

Mike Maxwell maxwell at umiacs.umd.edu
Tue Dec 26 21:03:50 UTC 2006


Alexandre Rafalovitch wrote:
> An interesting approach would be to use Amazon Mechanical Turk for
> this kinds of tasks.
> ...
> Has anybody else given a thought to this?

Don't know what languages you're interested in.  I have thought about 
"wikifying" other sorts of projects (like finding and keeping track of 
on-line computational resources, or building bilingual text collections) 
for "low density" languages.  I have never actually tried this, but it 
may be instructive to look at the languages for which there are 
substantial Wikipedia and Wiktionary resources.  Last time I looked, the 
usual suspects (the major and some "minor" European languages, plus 
Japanese) had at least 100k Wikipedia articles, while there was a 
slightly wider variety of languages with at least 10k Wikipedia articles 
(including Arabic (= MSA), Persian, Hebrew, Bahasa Indonesian, Korean, 
Malay, Thai, Turkish and Chinese).  For comparison, the English 
Wikipedia has 1.5 million articles.

My guess is that "wikification" (including the Amazon Mechanical Turk 
under this) will work best for languages where there are a substantial 
number of speakers with idle time, sufficient income to afford the 
computer and network connection, and sufficient education for the 
specific annotation task.
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu



More information about the Corpora mailing list