[Corpora-List] Automatic categorization of words.

Cyrus Shaoul cyrus.shaoul at ualberta.ca
Thu Mar 10 03:50:09 UTC 2005


Hi Again Listers,

Thanks for all your ideas and suggestions. For the benefit of the list,
here is a short summary of what I learned.

*****
Some people felt that WordNet might have some relevance:

For example: if 'object, physical object' was in the hypernymy tree of a
word and the corpus was annotated with word sense disambiguation (WSD) info.

*****


Also, there was a pointer to the paper:

D. Freitag, "Toward Unsupervised Whole-Corpus Tagging," Proceedings of
Coling 2004.

and http://clg.wlv.ac.uk/demos/similarity/

and suggestions of using clustering to classify words based on some
examples of concrete and abstract nouns. See also : a tool called "WEKA".

*****

There was the idea to look at the Regressive Imagery Dictionary at:

         http://www.simstat.com/WordStat/RID.htm

And a pointer to:

Martindale, C. (1990). The clockwork muse: The predictability of
artistic change. New York: Basic Books.

*****

Finally Dominic Widdows pointed out to the list quite correctly that
there are very few words that are purely abstract or concrete, and that
any dichotomous classification is bound to have many problems.

*****

I should refine my question: I would like to rate words using a
continuous measure of "concreteness" or "abstractness", not classify or
categorize them. (I wish I said this in my original message!!! Can't
take back those electrons...)

I will look into the clustering and measuring the relative proximity of
words to the cluster as a possibility.

Thanks to all for your time and help. Hope I can do the same for you one
day.

Cyrus



More information about the Corpora mailing list