[Corpora-List] frequent meanings of a word
Diana McCarthy
dianam at sussex.ac.uk
Fri Mar 17 19:25:25 UTC 2006
Hello Mimi
I assume from what you say that you are going to do this manually? How are
you going to define the meanings? Is it just one person doing the
annotation? You might want to consider how humans would agree on such a
task e.g. look at measures of inter-tagger agreement on sense tagging such
as those used in the Senseval exercises.
We (myself and colleagues at Sussex University) have been working on a way
to automatically detect the frequency of meanings of a word from a corpus
e.g. the BNC, using WordNet as our sense inventory. To do this we parsed
the data from the BNC and then created a thesaurus with the data using
distributional similarity:
Lin, Dekang (1998) An information-theoretic definition of similarity In
Proceedings of the 15th International Conference on Machine Learning
We used the "nearest neighbour" words in the thesaurus and their
distributional similarity to the target word as a guide to the frequency
of the WordNet senses of the target word. We used the WordNet similarity
package to associate these nearest neighbour words with the WordNet
senses.
We have evaluated against various resources, including a manually tagged
sample from the BNC (see Koeling et al, 2005 Domain-Specific Sense
Distributions and Predominant Sense Acquisition In Proceedings of the
joint conference on Human Language Technology and Empirical methods in
Natural Language Processing).
You are very welcome to the manually tagged data described in that paper
if it is of any use to you.
Diana
On Fri, 17 Mar 2006, Ziwei Huang wrote:
> Hello, I have a methodological question that needs your kind help:
>
> I need to look at the frequently used meanings of about 200 different words in BNC corpus, and wonder whether there is a quick/easy way to do that?
>
> My intended way is to randomly select 500 instances of a word from the whole corpus, then
> look at 50 instances (say, the 1st, 11st, 21st ... 491st instances) and list the meanings (and their frequencies) of the word; then move to the next 50 instances (2nd, 12nd... 492nd) to see whether any new meanings have come out; if there is any then move to the next 50 instances, and repeat that until no more new meanings emerge.
>
> Can someone kindly tell me whether this approach is acceptable to describe the frequently used meanings of a word (or the 'default' meanings of a word in actual use), and whether there is any reference/source for this (or any other easy and quick) methodology?
>
> Many thanks!
>
> Mimi
>
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses, which could damage your computer system:
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>
>
>
--
Diana McCarthy,
Department of Informatics,
University of Sussex,
Falmer,
Brighton,
BN1 9QH.
-------------------------------------
http://www.informatics.susx.ac.uk/research/nlp/mccarthy/mccarthy.html
==========================================================================
More information about the Corpora
mailing list