[Corpora-List] corpus of rated words
Shahzad Khan
sk453 at cam.ac.uk
Mon Jun 4 03:51:38 UTC 2007
Dear Claire, I'm going to concur with Oliver in saying that approaching
this problem _directly_ is a futile issue. The complexity lies in the fact
that the meaning of words are dependent on the context. However, I'd take a
look at the following paper:
Andrea Esuli and Fabrizio Sebastiani. (2006). SENTIWORDNET: A Publicly
Available Lexical Resource for Opinion Mining. 5th Conference on Language
Resources and Evaluation, 22-28/5/2006, Genova (IT)
It's probably the closest that you'll get to what you are looking for at
the moment. I urge you to read section 2.3 'Some statistics' carefully, and
you'll note that most words cannot be absolutely categorized as being
negative or positive.
Also refer to Carlotta's thesis, where she details some of the inter-genre
related issues that you could face when dealing with terms/features/words
extracted for sentiment classification:
Engstrom, Charlotta. 2004. Topic Dependence in Sentiment Classification.
Master's thesis, St Edmunds's College, University of Cambridge.
Having said all that, I do believe that if you model the terms to take on
board helping attributes related to sets of related words, genre,
predicating verbs, the nouns being modified etc, you may be on your way to
cracking this. The word, all alone, by itself is not very useful though.
It is definitely an interesting research problem.
- Shahzad
--
Shahzad Khan
Ph.D. Candidate
Natural Language and Information Processing Group
Computer Laboratory
University of Cambridge (UK)
More information about the Corpora
mailing list