[Corpora-List] frequent meanings of a word

Saif Mohammad uvgotsaif at gmail.com
Fri Mar 17 19:49:16 UTC 2006


Hi Mimi,

While manually determining the intended senses (and thereby sense
dominance) may be more accurate, it is time-intensive. I would like to
bring to your attention the following automatic methods that determine
word sense dominance from unannotated text:

(1) "Finding predominant senses in untagged text" McCarthy, D.,
Koeling, R., Weeds, J. and Carroll, J. In Proceedings of the 42nd
Annual Meeting of the Association for Computational Linguistics. 2004,
Barclona, Spain. pp 280-287.

(2) "Determining Word Sense Dominance Using a Thesaurus", Saif
Mohammad and Graeme Hirst, To appear in Proceedings of the 11th
conference of the European chapter of the Association for
Computational Linguistics (EACL-2006), April 2006, Trento, Italy.

I am not sure what sense inventory you are using, but
it should be noted that both these approaches are somewhat tied to
specific sense inventories. The McCarthy et al. method does a marriage
of distributional and semantic measures of similarity to determine
sense dominance, and so relies on WordNet. The second approach
(proposed by me and Hirst) relies on using a number of ambiguous words
to together unambiguously represent a sense. And so, we use a
published thesaurus as the sense inventory (categories roughly
correspond to coarse senses). Both these approaches can be used to get
domain-specific sense dominance, as well.

If you are using WordNet as the sense inventory, and if all you need
is the domain-free predominant sense for each word or a rough ranking
as per frequencies, then that can be obtained directly from WordNet
itself, wherein the senses for a word are listed in the order of their
dominance in the SemCor corpus. Note that senses that are not found in
SemCor are listed at random and the SemCor corpus is relatively small
(about 250,000 words).

Good luck,
-Saif

On 3/17/06, Ziwei Huang <aexzh1 at nottingham.ac.uk> wrote:
> Hello, I have a methodological question that needs your kind help:
>
> I need to look at the frequently used meanings of about 200 different words in BNC corpus, and wonder whether there is a quick/easy way to do that?
>
> My intended way is to randomly select 500 instances of a word from the whole corpus, then
> look at 50 instances (say, the 1st, 11st, 21st ... 491st instances) and list the meanings (and their frequencies) of the word; then move to the next 50 instances (2nd, 12nd... 492nd) to see whether any new meanings have come out; if there is any then move to the next 50 instances, and repeat that until no more new meanings emerge.
>
> Can someone kindly tell me whether this approach is acceptable to describe the frequently used meanings of a word (or the 'default' meanings of a word in actual use), and whether there is any reference/source for this (or any other easy and quick) methodology?
>
> Many thanks!
>
> Mimi
>
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses, which could damage your computer system:
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>
>
>


--
Saif Mohammad
University of Toronto
http://www.cs.toronto.edu/~smm



More information about the Corpora mailing list