[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing
Alexander Osherenko
osherenko at gmx.de
Tue Dec 20 15:19:12 UTC 2011
Hi Michal,
thanks for your comments. Very interesting!
1. Social aspects. You have to consider that the reviews from Amazon are
composed by different authors that have their own style of writing.
Moreover, you have to consider different cultural background, for example,
Americans and Englishmen use different words to express same things. Goethe
used other words than a truck driver does. How can a classifier calculate a
weight of a lexical feature if this lexical feature is not present in the
analyzed text?
In my demo, the author is an American James Berardinelli. He has his own
style of expressing opinions. Other person would do it in another manner.
In case of Amazon reviewers, there are several people that express their
opinion about the same thing. Hence, the weights in the statistical
classifiers can be deceptive because they are calculated for a community of
different reviewers. I assume you have to compose individual datasets for
persons of each cultural background or you have to use majority or average
vote to calculate a general vote.
Moreover, the datasets I used for learning are composed on the basis of
grammatically correct texts and not using weblogs with their
characteristics such as repetitions and so on. I describe the differences
better in my thesis. For example, I assume that POS-tagging
using TreeTagger is better on literary texts.
2. Sparse data. The datasets that underlie my demo contain 215 instances
for a 9-classes-problem. It's not much. That's why your and my feelings
that probabilistic NaiveBayes performs better can be correct. It is anyhow
much quicker. A classifier, for example, analytical SVM can use more texts
but then you have to consider overfitting.
Best
Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111220/e7abdc75/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list