Corpora: Collaborative effort

Robert Luk (COMP staff) csrluk at comp.polyu.edu.hk
Tue Jun 13 02:47:51 UTC 2000


> From krovetz at research.nj.nec.com Tue Jun 13 10:03:46 2000
> Date: Mon, 12 Jun 2000 21:59:37 -0400
> From: Bob Krovetz <krovetz at research.nj.nec.com>
> To: corpora at hd.uib.no
> Subject: Re: Corpora: Collaborative effort
>
> Robert Luk wrote:
>
> >Consider that one has 6 sense tags and the other also has 6 sense tags for the same
> >word in a sentence, assuming that they use the same set of sense tags
> >(although not likely). The likelihood that the two tagging
> >algorithms agreed by chance (independently) is 6 x 1/6 x 1/6. So, the
> >above seems to be true if there are 2 sense tags for the word:
> >
> >	2 x 1/2 x 1/2.
> >
> >Is this correct?
>
> In the case of Semcor and DSO, the sense inventory was the same (WordNet).
> The rate of agreement I mentioned was the agreement we would get by
> tagging all instances with the most frequent sense for the word in the corpus.

I was referring to "by chance". I just wondered how did you arrive at 0.56 for agreement by chance
between tags assigned by 2 different tagging algorithms?

> I don't see why you say it is not likely that they will use the same set of
> sense tags.

They are the same then.

> How can we make meaningful comparisions between word-sense
> tagging systems without using the same word sense inventory?  That was
> the purpose of the SENSEVAL competition.

I agreed if the sense tags have completely different meaning. However, the
differences in meaning between tags may be in shades of meaning rather than the crisp
decision that they are or not same. We can still "compare"
them if we think of senses with gradation - by comparing the contexts of the word usage or
make assignment of one tag to the other by human. It is like merging 2 dictionaries.

There may be some algorithms that work "WITH" a particular tag set and these algorithms
may not work well in another tag set. Consider a system that uses a set of handcrafted
rules for tag assignment or for the improvement of tag assignment. If these rules use the
tag information to decide on tagging of other words, then we cannot abstract them out automatically
to work with another set.

Best,

Robert Luk



More information about the Corpora mailing list