[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing

Alexander Osherenko osherenko at gmx.de
Wed Dec 21 18:24:50 UTC 2011


Hi,

> I tried a couple of reviews from Amazon. Among different feature sets
> from 1 to 6, always one is close to the amazon's ranking, but unfortunately
> its never one feature set in particular, but rather randomly one the six.
> Besides the closest method, all other are usually reversed (e.g., if the
> closest method gives 5 star, all other give 1). However, this might have
> just happen for those couple examples I tried (Reviews of Kindle on Amazon).
>


> Isn't that more-or-less what one would expect from random output?
>
> It can be considered as random if classification is performed on weblogs
although the classifiers are trained on grammatically correct movie
reviews. Actually, the recognition rate of my approach for almost all
corpora I studied in my thesis is about triple of random choice. For
example, for a 9-classes-problem choice by chance is 11.(1)%. My approach
calculates about 34% and so on. But you have to classify a review that is
conform with the style of texts used for learning the classifier.
Otherwise, you get unreliable results.


> As a human, and an Englishman, I expect I can understand and fairly judge
> the sentiment of most reviews written by, say, an American truck driver,
> without undue reprogramming. Is this really an unrealistic goal for our
> algorithms? And I wonder, is mastering a highly restricted style or
> register a necessary step in that direction... or is it in fact a detour.
>
> As a human and as an Englishman, you learned to recognize particular words
of English language. Now you understand English in every country but you
can't comprehend it. Understanding is only the first step of cognition,
comprehension takes much more time and energy. Or can you explain the most
severe problems of American truck drivers nowadays? Or tell me what
problems you would discuss with an American truck driver? In terms of data
mining, it means: you know what features you have in a dataset but you
don't know their weights. In my opinion, if you want to learn "weights" you
have to live in the country and tune the weights.

I don't think that we should worry about reprogramming -- first of all we
can be happy that at least NaiveBayes or SVM classify texts more or less
realistically. In my demo, I maintain about 30 classifiers that were
trained using lexical, stylometric, deictic, grammatical datasets. You can
look over a framework I use for this purpose (
www.socioware.de/technology.html). AO

>
>
> > How can a classifier calculate a weight of a lexical feature if this
> lexical feature is not present in the analyzed text?
>
>
>
> By inferring from similarities between that feature and those that *are*
> present (e.g. through semi-supervised learning/bootstrapping of unannotated
> data)? That's at least one method about which a fair amount has been
> written already. I'm not saying its a solved problem mind you, but perhaps
> you're not up against a brick wall yet?
>
>
>
> Justin Washtell
> University of Leeds
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111221/502d3556/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list