[Corpora-List] corpus of rated words

Shahzad Khan sk453 at cam.ac.uk
Mon Jun 4 03:51:38 UTC 2007


Dear Claire, I'm going to concur with Oliver in saying that approaching 
this problem _directly_ is a futile issue. The complexity lies in the fact 
that the meaning of words are dependent on the context. However, I'd take a 
look at the following paper:

Andrea Esuli and Fabrizio Sebastiani. (2006). SENTIWORDNET: A Publicly 
Available Lexical Resource for Opinion Mining. 5th Conference on Language 
Resources and Evaluation, 22-28/5/2006, Genova (IT)

It's probably the closest that you'll get to what you are looking for at 
the moment. I urge you to read section 2.3 'Some statistics' carefully, and 
you'll note that most words cannot be absolutely categorized as being 
negative or positive.

Also refer to Carlotta's thesis, where she details some of the inter-genre 
related issues that you could face when dealing with terms/features/words 
extracted for sentiment classification:

Engstrom, Charlotta. 2004. Topic Dependence in Sentiment Classification. 
Master's thesis, St Edmunds's College, University of Cambridge.

Having said all that, I do believe that if you model the terms to take on 
board helping attributes related to sets of related words, genre, 
predicating verbs, the nouns being modified etc, you may be on your way to 
cracking this. The word, all alone, by itself is not very useful though.

It is definitely an interesting research problem.

- Shahzad


-- 
Shahzad Khan
Ph.D. Candidate
Natural Language and Information Processing Group
Computer Laboratory
University of Cambridge (UK)



More information about the Corpora mailing list