[Corpora-List] Many thanks and a summary of all the responses.

Yuanyong Wang wyy at cse.unsw.EDU.AU
Tue May 10 00:54:19 UTC 2005


      Many thanks, the responses definitely cleared my confusion and
provide me with more valuable information regarding how to utilize the
senseval test data and the issue of word sense disambiguation itself.

      All the responses are summarized below:


                               The question:
...I am planning to conduct experiment
on the Senseval-3 data. But after reading the answer key file, one fact
appears a bit confusing, sometimes for one test case, multiple sense
tags are given, and one of the multiple sense tags could be simply a
letter
"U". ... how to make sense of  those multiple sense tag cases?



                           Response from Jordi:

   As far as I am concerned there are two special tags in SENSEVAL-II ( and
proably it also applies for SENSEVAL-III)

P: meaning PROPER-NAME
U: meaning UNASSIGNABLE

Note that, as there were multiple annotator, you can find a disjuntion of
the tag "U" with a sense tag.

You can find a description of the tags and how they were develop in
English
Lexical Sample Task Description Adam Kilgarriff
<http://www.itri.bton.ac.uk/events/senseval/englexsamp.ps>

available  at http://www.itri.bton.ac.uk/events/senseval/englexsamp.ps



                           Response from Adam:

The trouble with word sense disambiguation is word senses. They just won't
behave.

Sometimes, the best that a human can do is to say that a corpus instance
is
related to more than one word sense (so it is tagged with multiple sense
tags) or that it is unassignable (U) or that it is like one of the senses
in
one way but not in others (combination of U and one or more regular sense
tag.) This is the scheme we have used for English for all three Sensevals,
you can find descriptions in the SENSEVAL 1 Special Issue of Computers and
the Humanities 34 (1-2) amongst other places, here are links to papers
that
discuss it

         Best
                 Adam

2000  (with Joseph Rosenzweig) "English Framework and Results
<http://www.lexmasterclass.com/people/Publications/2000-KilgRosenzweig-Sense
val1frame.pdf> ." Computers and the Humanities 34 (1-2), Special Issue on
SENSEVAL.
2000 (with Martha Palmer) Introduction to the Special Issue on SENSEVAL
<http://www.lexmasterclass.com/people/Publications/2000-KilgPalmer-Senseval1
Intro.pdf> . Computers and the Humanities 34 (1-2). (Also guest editors
for
the Special Issue)



                             Response from Diana:


You will probably hear from the task organisers directly.  U is
normally given for Unassigned tags where annotators are not sure what
the appropriate tag is, and multiple tags are given where several senses
are appropriate for the same instance. These details should be in the
task descriptions in the proceedings (which you can obtain from the web
page).


                            Response from Ted:


    This is an interesting question, and there are really two ways to look
at
it. First, you may have cases where the use of the word is truly
ambiguous.

I wish I were a star.

This most likely means a movie star, but if the surrounding context is
vague or relating to science fiction or something, it might mean a star
like our sun. In a case like this, where there is true ambiguity, the
a tagger might give multiple senses.

There are also cases where the sense distinctions are vague or very finely
grained. This is actually the main problem for taggers with respect to the
Senseval data, and much of this revolves around the sense inventory. For
example, the word "art" has several very very closely related senses in
WordNet, and it's very hard to pick them apart. So in a case like this, a
tagger may have no choice but to pick multiple senses.

So, my advice is not to just look at the answer key, but rather look at
the senses that those key results are pointing to. I am fairly confident
that in many cases you'll see they are very finely grained distinctions
that are hard to pull apart.



            Thanks again for all the reply.



     Regards
     Robin



More information about the Corpora mailing list