[Corpora-List] frequent meanings of a word

Diana McCarthy dianam at sussex.ac.uk
Fri Mar 17 19:25:25 UTC 2006


Hello Mimi

I assume from what you say that you are going to do this manually? How are 
you going to define the meanings? Is it just one person doing the 
annotation? You might want to consider how humans would agree on such a 
task e.g. look at measures of inter-tagger agreement on sense tagging such 
as those used in the Senseval exercises.

We (myself and colleagues at Sussex University) have been working on a way 
to automatically detect the frequency of meanings of a word from a corpus 
e.g. the BNC, using WordNet as our sense inventory. To do this we parsed 
the data from the BNC and then created a thesaurus with the data using 
distributional similarity:

Lin, Dekang (1998) An information-theoretic definition of similarity In 
Proceedings of the 15th International Conference on Machine Learning

We used the "nearest neighbour" words in the thesaurus and their 
distributional similarity to the target word as a guide to the frequency 
of the WordNet senses of the target word. We used the WordNet similarity 
package to associate these nearest neighbour words with the WordNet 
senses.

We have evaluated against various resources, including a manually tagged 
sample from the BNC (see Koeling et al, 2005 Domain-Specific Sense 
Distributions and Predominant Sense Acquisition In Proceedings of the 
joint conference on Human Language Technology and Empirical methods in 
Natural Language Processing).

You are very welcome to the manually tagged data described in that paper 
if it is of any use to you.

Diana



On Fri, 17 Mar 2006, Ziwei Huang wrote:

> Hello, I have a methodological question that needs your kind help:
>
> I need to look at the frequently used meanings of about 200 different words in BNC corpus, and wonder whether there is a quick/easy way to do that?
>
> My intended way is to randomly select 500 instances of a word from the whole corpus, then
> look at 50 instances (say, the 1st, 11st, 21st ... 491st instances) and list the meanings (and their frequencies) of the word; then move to the next 50 instances (2nd, 12nd... 492nd) to see whether any new meanings have come out; if there is any then move to the next 50 instances, and repeat that until no more new meanings emerge.
>
> Can someone kindly tell me whether this approach is acceptable to describe the frequently used meanings of a word (or the 'default' meanings of a word in actual use), and whether there is any reference/source for this (or any other easy and quick) methodology?
>
> Many thanks!
>
> Mimi
>
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses, which could damage your computer system:
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>
>
>

-- 

Diana McCarthy,
Department of Informatics,
University of Sussex,
Falmer,
Brighton,
BN1 9QH.
-------------------------------------
http://www.informatics.susx.ac.uk/research/nlp/mccarthy/mccarthy.html
 ==========================================================================



More information about the Corpora mailing list