[Corpora-List] Corpora Digest, Vol 54, Issue 31
Daoud Clarke
daoud.clarke at gmail.com
Wed Dec 21 11:22:44 UTC 2011
Hi,
As another developer of commercial tools (for a different company), I
would echo Taras's comments. Anyone interested in our approach might
like to read our paper at WASSA this year:
http://hdl.handle.net/2299/5811
Daoud
> From: Taras <taras8055 at gmail.com>
> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
> lexical affect sensing
> To: corpora at uib.no
>
> Hi
> I am a developer of one of commercial tools. And I think there are two
> majour problems that prevent them being more accurate:
> 1. They try to keep classification cheap. Cheap means generic. But the
> only way of getting a good sentiment accuracy is making classifiers
> specific. But this is expensive.
> 2. The other problem is the neutrality bias. In most cases texts are
> usually neutral or balanced, and it makes extraction of non-neutrals
> very difficult. The problem actually is not subjectivity or sentiment
> classification taken separately, but the combination of the two.
>
> Of course there are other problems: noisy language, various ways of
> expressing sentiment etc. But the two aforementioned are the most
> business-specific ones.
>
> Regards,
>
> Taras Zagibalov
>
> On 21/12/11 09:52, iain wrote:
>> I've been following this thread with interest. I'm a commercial semi-lurker
>> rather than an involved theorist, but my colleagues and I have done some
>> work with some of the available commercial sentiment tools.
>>
>> Our experience is that they are really not very accurate.
>>
>> There are some issues with evaluating them. I'm not using a pre-marked gold
>> standard to score them, but rather submitting text from web pages to them
>> and comparing the output with the text, which makes our results far from
>> scientific! And we've done dozens not thousands.
>>
>> What we tend to find is that we look at the output and then at the text and
>> more often than not say, 'uh'. Pretty much as the reviewers of Alexander's
>> test site have been doing! Which might make Alexander's work close to the
>> commercial state of the art :-J ....
>>
>> Some of the reviews of commercial tools I've seen seem to indicate that if
>> you take the 'neutral' sentiment articles out then the actual accuracy drops
>> down from the claimed 70% quite considerably. In short, the tools are very
>> good at detecting no sentiment but rather poorer at getting actual sentiment
>> right.
>>
>> I was wondering if anyone on the list had experience with the commercial
>> tools and what sort of results they found. Could they recommend one or
>> another of the suppliers? I'd also be interested if any tool suppliers
>> (also commercially semi-lurking :) ) might have some input to this - what is
>> their real expectations of quality?
>>
>>
>> Iain
>> -----Original Message-----
>> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
>> Justin Washtell
>> Sent: 20 December 2011 20:32
>> To: Alexander Osherenko; ptaszynski at ieee.org
>> Cc: corpora at uib.no; corpora-request at uib.no
>> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
>> lexical affect sensing
>>
>> Michal and Alexander,
>>
>>
>>
>> I thoroughly agree with Michal (and Graham) that these kinds of demo are a
>> good thing, and despite my - ongoing - criticisms, I'd like to take my hat
>> off to Alexander for sharing this work. There are already countless papers
>> describing technical approaches to this-and-that, and showing
>> impressive-looking results achieved upon [perhaps sometimes carefully
>> selected or tuned-to] test datasets. But I suspect that there's presently no
>> better way to get a feel for where the state-of-the-art really is (and to
>> shed some qualitative light on matters) than by complementing these works
>> with some inquisitive and unrestrained hands-on tinkering.
>>
>>
>>
>>> I tried a couple of reviews from Amazon. Among different feature sets from
>> 1 to 6, always one is close to the amazon's ranking, but unfortunately its
>> never one feature set in particular, but rather randomly one from the six.
>> Besides the closest method, all other are usually reversed (e.g., if the
>> closest method gives 5 star, all other give 1). However, this might have
>> just happen for those couple examples I tried (Reviews of Kindle on Amazon).
>>
>>
>>
>> Isn't that more-or-less what one would expect from random output?
>>
>>
>>
>>> Social aspects. You have to consider that the reviews from Amazon are
>> composed by different authors that have their own style of writing.
>> Moreover, you have to consider different cultural background, for example,
>> Americans and Englishmen use different words to express same things. Goethe
>> used other words than a truck driver does.
>>
>>
>>
>> As a human, and an Englishman, I expect I can understand and fairly judge
>> the sentiment of most reviews written by, say, an American truck driver,
>> without undue reprogramming. Is this really an unrealistic goal for our
>> algorithms? And I wonder, is mastering a highly restricted style or register
>> a necessary step in that direction... or is it in fact a detour.
>>
>>
>>
>>> How can a classifier calculate a weight of a lexical feature if this
>> lexical feature is not present in the analyzed text?
>>
>>
>>
>> By inferring from similarities between that feature and those that *are*
>> present (e.g. through semi-supervised learning/bootstrapping of unannotated
>> data)? That's at least one method about which a fair amount has been written
>> already. I'm not saying its a solved problem mind you, but perhaps you're
>> not up against a brick wall yet?
>>
>>
>>
>> Justin Washtell
>> University of Leeds
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> corpora-request at uib.no
>
> You can reach the person managing the list at
> corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 54, Issue 31
> ***************************************
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list