[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing

Taras taras8055 at gmail.com
Wed Dec 21 14:31:22 UTC 2011


Right. You can also add temporal dependency: a trained classifier may 
become obsolete in several weeks/months. Rules may be less dependent but 
do not expect high recall.
But the companies want it 'made once, used forever and applied to 
everything'.

Taras

On 21/12/11 14:23, iain wrote:
> My own suspicion in terms of what Taras was saying is that the expense is in
> making the classifier suitable for a domain or genre.  It is quite clear
> that something which can decode tweets may not work too well on blogs or
> forums like this one!
>
> Equally, a general classifier won't be as effective on a sub-language as the
> one it's trained on.  Scientific vitriol can be couched in terms which only
> the very adept outsider will recognise!
>
>
> Iain
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Michal Ptaszynski
> Sent: 21 December 2011 13:35
> To: corpora at uib.no; corpora-request at uib.no
> Cc: ptaszynski at hgu.jp
> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
> lexical affect sensing
>
> Dear Taras, Iain
> Dear All,
>
> What Iain and taras say is one of the best things I've heard lately, mostly,
> because it confirms my findings too. However, your experience is probably
> based more on real world examples.
> If you could provide a proof of some kind, or a description of some
> examples, this would be a very useful hint.
>
> Just a word on "keeping classification cheap". I think this is not as much
> about the money, as it is about logic (and trying to find it).
> For example, it is not much of a research to just have, for example, 100
> people write a lot of rules. Even if a system cerated this way would achieve
> high performance its not too interesting from the scientific point of view.
>
> What we, researchers, try to find is a kind of logical reasoning that could
> be represented computationally. So, for example, if Mr.X has a
> 1000-rule-system that gives him 85% accuracy, and Mr.Y has a
> 10-(general)-rule-system that gives him 82% accuracy, a researcher would
> rather be first interested in Mr.Y's system.
>
> I think this applies to all fields that have their commercial variations.
> For example, each year there is a number of papers on machine translation
> presenting high results, but the level of actual machine translation
> software available on the market is rather low (As a former translator I
> tried about 5 different ones).
>
> Best,
>
> Michal
>
>
> ---------------------
> Od: Taras<taras8055 at gmail.com>
> Do: corpora at uib.no
> Data: Wed, 21 Dec 2011 10:43:47 +0000
> Temat: Re: [Corpora-List] EmoText - Software for opinion mining and
> lexical affect sensing
>
>
> Hi
> I am a developer of one of commercial tools. And I think there are two
> majour problems that prevent them being more accurate:
> 1. They try to keep classification cheap. Cheap means generic. But the
> only way of getting a good sentiment accuracy is making classifiers
> specific. But this is expensive.
> 2. The other problem is the neutrality bias. In most cases texts are
> usually neutral or balanced, and it makes extraction of non-neutrals very
> difficult. The problem actually is not subjectivity or sentiment
> classification taken separately, but the combination of the two.
>
> Of course there are other problems: noisy language, various ways of
> expressing sentiment etc. But the two aforementioned are the most
> business-specific ones.
>
> Regards,
>
> Taras Zagibalov
>
> On 21/12/11 09:52, iain wrote:
> I've been following this thread with interest.  I'm a commercial
> semi-lurker
> rather than an involved theorist, but my colleagues and I have done some
> work with some of the available commercial sentiment tools.
>
> Our experience is that they are really not very accurate.
>
> There are some issues with evaluating them.  I'm not using a pre-marked
> gold
> standard to score them, but rather submitting text from web pages to them
> and comparing the output with the text, which makes our results far from
> scientific!  And we've done dozens not thousands.
>
> What we tend to find is that we look at the output and then at the text and
> more often than not say, 'uh'.  Pretty much as the reviewers of Alexander's
> test site have been doing!  Which might make Alexander's work close to the
> commercial state of the art  :-J   ....
>
> Some of the reviews of commercial tools I've seen seem to indicate that if
> you take the 'neutral' sentiment articles out then the actual accuracy
> drops
> down from the claimed 70% quite considerably.  In short, the tools are very
> good at detecting no sentiment but rather poorer at getting actual
> sentiment
> right.
>
> I was wondering if anyone on the list had experience with the commercial
> tools and what sort of results they found.  Could they recommend one or
> another of the suppliers?  I'd also be interested if any tool suppliers
> (also commercially semi-lurking  ) might have some input to this - what is
> their real expectations of quality?
>
>
> Iain
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Justin Washtell
> Sent: 20 December 2011 20:32
> To: Alexander Osherenko; ptaszynski at ieee.org
> Cc: corpora at uib.no; corpora-request at uib.no
> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
> lexical affect sensing
>
> Michal and Alexander,
>
>
>
> I thoroughly agree with Michal (and Graham) that these kinds of demo are a
> good thing, and despite my - ongoing - criticisms, I'd like to take my hat
> off to Alexander for sharing this work. There are already countless papers
> describing technical approaches to this-and-that, and showing
> impressive-looking results achieved upon [perhaps sometimes carefully
> selected or tuned-to] test datasets. But I suspect that there's presently
> no
> better way to get a feel for where the state-of-the-art really is (and to
> shed some qualitative light on matters) than by complementing these works
> with some inquisitive and unrestrained hands-on tinkering.
>
>
>
> I tried a couple of reviews from Amazon. Among different feature sets from
> 1 to 6, always one is close to the amazon's ranking, but unfortunately its
> never one feature set in particular, but rather randomly one from the six.
> Besides the closest method, all other are usually reversed (e.g., if the
> closest method gives 5 star, all other give 1). However, this might have
> just happen for those couple examples I tried (Reviews of Kindle on
> Amazon).
>
>
>
> Isn't that more-or-less what one would expect from random output?
>
>
>
> Social aspects. You have to consider that the reviews from Amazon are
> composed by different authors that have their own style of writing.
> Moreover, you have to consider different cultural background, for example,
> Americans and Englishmen use different words to express same things. Goethe
> used other words than a truck driver does.
>
>
>
> As a human, and an Englishman, I expect I can understand and fairly judge
> the sentiment of most reviews written by, say, an American truck driver,
> without undue reprogramming. Is this really an unrealistic goal for our
> algorithms? And I wonder, is mastering a highly restricted style or
> register
> a necessary step in that direction... or is it in fact a detour.
>
>
>
> How can a classifier calculate a weight of a lexical feature if this
> lexical feature is not present in the analyzed text?
>
>
>
> By inferring from similarities between that feature and those that *are*
> present (e.g. through semi-supervised learning/bootstrapping of unannotated
> data)? That's at least one method about which a fair amount has been
> written
> already. I'm not saying its a solved problem mind you, but perhaps you're
> not up against a brick wall yet?
>
>
>
> Justin Washtell
> University of Leeds
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list