<span class="Apple-style-span" style>Hi Michal,</span><div style><br></div><div style>thanks for your comments. Very interesting!<br><div><br></div><div>1. Social aspects. You have to consider that the reviews from Amazon are composed by different authors that have their own style of writing. Moreover, you have to consider different cultural background, for example, Americans and Englishmen use different words to express same things. Goethe used other words than a truck driver does. How can a classifier calculate a weight of a lexical feature if this lexical feature is not present in the analyzed text?</div>

<div><br></div><div>In my demo, the author is an American James Berardinelli. He has his own style of expressing opinions. Other person would do it in another manner. In case of Amazon reviewers, there are several people that express their opinion about the same thing. Hence, the weights in the statistical classifiers can be deceptive because they are calculated for a community of different reviewers. I assume you have to compose individual datasets for persons of each cultural background or you have to use majority or average vote to calculate a general vote.</div>

<div><br></div><div>Moreover, the datasets I used for learning are composed on the basis of grammatically correct texts and not using weblogs with their characteristics such as repetitions and so on. I describe the differences better in my thesis. For example, I assume that POS-tagging using TreeTagger is better on literary texts. </div>

<div><br></div><div>2. Sparse data. The datasets that underlie my demo contain 215 instances for a 9-classes-problem. It's not much. That's why your and my feelings that probabilistic NaiveBayes performs better can be correct. It is anyhow much quicker. A classifier, for example, analytical SVM can use more texts but then you have to consider overfitting.</div>

<div><br></div><div>Best</div><div>Alexander</div></div>