[Corpora-List] WSD / # WordNet senses / Mechanical Turk

maxwell maxwell at umiacs.umd.edu
Tue Jul 16 20:40:02 UTC 2013


On 2013-07-16 14:03, Angus Grieve-Smith wrote:
> On 7/16/2013 2:43 AM, Adam Kilgarriff wrote:
> 
>> For either automatic WSD, or even for the gold standard, I agree
>> entirely with John:
>> 
>>> Miss Elliott, my high-school English teacher, wouldn't give
>>> anyone a gold star [for work like that]
> 
>  Well, now hang on a minute.
> 
>  First of all, how unambiguous were these, originally? Real language
> is full of ambiguous uses. I'm surprised that the experts can even
> agree 95% of the time, and I'd guess that part of their expert
> training was aimed at exaggerating that agreement to satisfy brittle
> models that weren't built to handle persistent ambiguity.
> 
>  Second, if the human experts didn't even agree, why would Miss
> Elliott expect her students to do any better? What kind of a sadist
> was she?

Miss Elliott?  Argh, I remember her!

All seriousness aside, I think we all know what's going on here.  
Irregardless (ouch, I'm sorry Miss Elliott, that's "regardless"!) of 
whether we think of this as a problem for humans (expert or otherwise) 
or for computers, there's a clustering problem.  We have a bunch of uses 
of some word in texts.  Some expert(s) have eyeballed a subset of these 
tokens-in-text and decided to make N clusters, which are the N senses.  
Later on someone else comes along and tries to cluster a new set of 
tokens-in-text using these same N clusters (and still later we get a 
computer to try to replicate those clusters-of-tokens).

But who's to say what N should be?  It is well known that given M 
experts who are asked to create senses based on the same corpus, for any 
moderately polysemous word you'll get at least M different Ns.  (Look at 
any two dictionaries.)  So asking those M experts to tag senses 
according to the N senses chosen by one of those experts is asking 
someone to perform a task that they don't really believe in.  If you got 
those experts in a room, you'd have 
http://www.arthermitage.org/Adriaen-van-Ostade/Brawl.html.

Granted, when experts tag word tokens for sense as input to a machine 
learning algorithm, they probably aren't the experts who created the N 
sense clusters in the first place.  But I can still hear them saying, 
"What idiot broke things up this way?  Can you say, 'Gerrymandering'?"

(And no, I did not like the way Miss Elliott diagrammed sentences. 
Unfortunately, when I did my PhD, I found out she was right.)

    Mike Maxwell
    U Maryland

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list