[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing

Tue Dec 20 20:31:49 UTC 2011

Michal and Alexander,

I thoroughly agree with Michal (and Graham) that these kinds of demo are a good thing, and despite my - ongoing - criticisms, I'd like to take my hat off to Alexander for sharing this work. There are already countless papers describing technical approaches to this-and-that, and showing impressive-looking results achieved upon [perhaps sometimes carefully selected or tuned-to] test datasets. But I suspect that there's presently no better way to get a feel for where the state-of-the-art really is (and to shed some qualitative light on matters) than by complementing these works with some inquisitive and unrestrained hands-on tinkering.

> I tried a couple of reviews from Amazon. Among different feature sets from 1 to 6, always one is close to the amazon's ranking, but unfortunately its never one feature set in particular, but rather randomly one from the six. Besides the closest method, all other are usually reversed (e.g., if the closest method gives 5 star, all other give 1). However, this might have just happen for those couple examples I tried (Reviews of Kindle on Amazon).

Isn't that more-or-less what one would expect from random output?

> Social aspects. You have to consider that the reviews from Amazon are composed by different authors that have their own style of writing. Moreover, you have to consider different cultural background, for example, Americans and Englishmen use different words to express same things. Goethe used other words than a truck driver does.

As a human, and an Englishman, I expect I can understand and fairly judge the sentiment of most reviews written by, say, an American truck driver, without undue reprogramming. Is this really an unrealistic goal for our algorithms? And I wonder, is mastering a highly restricted style or register a necessary step in that direction... or is it in fact a detour.

> How can a classifier calculate a weight of a lexical feature if this lexical feature is not present in the analyzed text?

By inferring from similarities between that feature and those that *are* present (e.g. through semi-supervised learning/bootstrapping of unannotated data)? That's at least one method about which a fair amount has been written already. I'm not saying its a solved problem mind you, but perhaps you're not up against a brick wall yet?

Justin Washtell
University of Leeds

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora