[Corpora-List] WSD / # WordNet senses / Mechanical Turk
maxwell
maxwell at umiacs.umd.edu
Tue Jul 16 20:40:02 UTC 2013
On 2013-07-16 14:03, Angus Grieve-Smith wrote:
> On 7/16/2013 2:43 AM, Adam Kilgarriff wrote:
>
>> For either automatic WSD, or even for the gold standard, I agree
>> entirely with John:
>>
>>> Miss Elliott, my high-school English teacher, wouldn't give
>>> anyone a gold star [for work like that]
>
> Well, now hang on a minute.
>
> First of all, how unambiguous were these, originally? Real language
> is full of ambiguous uses. I'm surprised that the experts can even
> agree 95% of the time, and I'd guess that part of their expert
> training was aimed at exaggerating that agreement to satisfy brittle
> models that weren't built to handle persistent ambiguity.
>
> Second, if the human experts didn't even agree, why would Miss
> Elliott expect her students to do any better? What kind of a sadist
> was she?
Miss Elliott? Argh, I remember her!
All seriousness aside, I think we all know what's going on here.
Irregardless (ouch, I'm sorry Miss Elliott, that's "regardless"!) of
whether we think of this as a problem for humans (expert or otherwise)
or for computers, there's a clustering problem. We have a bunch of uses
of some word in texts. Some expert(s) have eyeballed a subset of these
tokens-in-text and decided to make N clusters, which are the N senses.
Later on someone else comes along and tries to cluster a new set of
tokens-in-text using these same N clusters (and still later we get a
computer to try to replicate those clusters-of-tokens).
But who's to say what N should be? It is well known that given M
experts who are asked to create senses based on the same corpus, for any
moderately polysemous word you'll get at least M different Ns. (Look at
any two dictionaries.) So asking those M experts to tag senses
according to the N senses chosen by one of those experts is asking
someone to perform a task that they don't really believe in. If you got
those experts in a room, you'd have
http://www.arthermitage.org/Adriaen-van-Ostade/Brawl.html.
Granted, when experts tag word tokens for sense as input to a machine
learning algorithm, they probably aren't the experts who created the N
sense clusters in the first place. But I can still hear them saying,
"What idiot broke things up this way? Can you say, 'Gerrymandering'?"
(And no, I did not like the way Miss Elliott diagrammed sentences.
Unfortunately, when I did my PhD, I found out she was right.)
Mike Maxwell
U Maryland
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list