[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing
Wladimir Sidorenko
wlsidorenko at gmail.com
Wed Dec 21 15:29:03 UTC 2011
Hello all,
And greetings from another semi-lurker. I just would like to put in my two
cents to the discussion. To my mind a statistical approach could only
limited be helpful to a well performing and more or less precise Sentiment
Analysis system. I can imagine that statistics would function well on
deciding whether a particular sentence contains a sentiment or not,
provided that we have enough data for training our classifier. But for more
fine-grained tasks (as deciding whether a sentiment is positive or
negative) I'd rather go for a linguistic-oriented approach (with basic
grammatical analysis - like shallow parsing - and some pattern matching
techniques). Of course that's nothing interesting from the scientific point
of view, but in my opinion it's at least a good option in terms of system
design.
Kind regards,
Vladimir Sidorenko
2011/12/21 Taras <taras8055 at gmail.com>
> Right. You can also add temporal dependency: a trained classifier may
> become obsolete in several weeks/months. Rules may be less dependent but do
> not expect high recall.
> But the companies want it 'made once, used forever and applied to
> everything'.
>
> Taras
>
>
> On 21/12/11 14:23, iain wrote:
>
>> My own suspicion in terms of what Taras was saying is that the expense is
>> in
>> making the classifier suitable for a domain or genre. It is quite clear
>> that something which can decode tweets may not work too well on blogs or
>> forums like this one!
>>
>> Equally, a general classifier won't be as effective on a sub-language as
>> the
>> one it's trained on. Scientific vitriol can be couched in terms which
>> only
>> the very adept outsider will recognise!
>>
>>
>> Iain
>> -----Original Message-----
>> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no**] On Behalf
>> Of
>> Michal Ptaszynski
>> Sent: 21 December 2011 13:35
>> To: corpora at uib.no; corpora-request at uib.no
>> Cc: ptaszynski at hgu.jp
>> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
>> lexical affect sensing
>>
>> Dear Taras, Iain
>> Dear All,
>>
>> What Iain and taras say is one of the best things I've heard lately,
>> mostly,
>> because it confirms my findings too. However, your experience is probably
>> based more on real world examples.
>> If you could provide a proof of some kind, or a description of some
>> examples, this would be a very useful hint.
>>
>> Just a word on "keeping classification cheap". I think this is not as much
>> about the money, as it is about logic (and trying to find it).
>> For example, it is not much of a research to just have, for example, 100
>> people write a lot of rules. Even if a system cerated this way would
>> achieve
>> high performance its not too interesting from the scientific point of
>> view.
>>
>> What we, researchers, try to find is a kind of logical reasoning that
>> could
>> be represented computationally. So, for example, if Mr.X has a
>> 1000-rule-system that gives him 85% accuracy, and Mr.Y has a
>> 10-(general)-rule-system that gives him 82% accuracy, a researcher would
>> rather be first interested in Mr.Y's system.
>>
>> I think this applies to all fields that have their commercial variations.
>> For example, each year there is a number of papers on machine translation
>> presenting high results, but the level of actual machine translation
>> software available on the market is rather low (As a former translator I
>> tried about 5 different ones).
>>
>> Best,
>>
>> Michal
>>
>>
>> ---------------------
>> Od: Taras<taras8055 at gmail.com>
>> Do: corpora at uib.no
>> Data: Wed, 21 Dec 2011 10:43:47 +0000
>> Temat: Re: [Corpora-List] EmoText - Software for opinion mining and
>> lexical affect sensing
>>
>>
>> Hi
>> I am a developer of one of commercial tools. And I think there are two
>> majour problems that prevent them being more accurate:
>> 1. They try to keep classification cheap. Cheap means generic. But the
>> only way of getting a good sentiment accuracy is making classifiers
>> specific. But this is expensive.
>> 2. The other problem is the neutrality bias. In most cases texts are
>> usually neutral or balanced, and it makes extraction of non-neutrals very
>> difficult. The problem actually is not subjectivity or sentiment
>> classification taken separately, but the combination of the two.
>>
>> Of course there are other problems: noisy language, various ways of
>> expressing sentiment etc. But the two aforementioned are the most
>> business-specific ones.
>>
>> Regards,
>>
>> Taras Zagibalov
>>
>> On 21/12/11 09:52, iain wrote:
>> I've been following this thread with interest. I'm a commercial
>> semi-lurker
>> rather than an involved theorist, but my colleagues and I have done some
>> work with some of the available commercial sentiment tools.
>>
>> Our experience is that they are really not very accurate.
>>
>> There are some issues with evaluating them. I'm not using a pre-marked
>> gold
>> standard to score them, but rather submitting text from web pages to them
>> and comparing the output with the text, which makes our results far from
>> scientific! And we've done dozens not thousands.
>>
>> What we tend to find is that we look at the output and then at the text
>> and
>> more often than not say, 'uh'. Pretty much as the reviewers of
>> Alexander's
>> test site have been doing! Which might make Alexander's work close to the
>> commercial state of the art :-J ....
>>
>> Some of the reviews of commercial tools I've seen seem to indicate that if
>> you take the 'neutral' sentiment articles out then the actual accuracy
>> drops
>> down from the claimed 70% quite considerably. In short, the tools are
>> very
>> good at detecting no sentiment but rather poorer at getting actual
>> sentiment
>> right.
>>
>> I was wondering if anyone on the list had experience with the commercial
>> tools and what sort of results they found. Could they recommend one or
>> another of the suppliers? I'd also be interested if any tool suppliers
>> (also commercially semi-lurking ) might have some input to this - what is
>> their real expectations of quality?
>>
>>
>> Iain
>> -----Original Message-----
>> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no**] On Behalf
>> Of
>> Justin Washtell
>> Sent: 20 December 2011 20:32
>> To: Alexander Osherenko; ptaszynski at ieee.org
>> Cc: corpora at uib.no; corpora-request at uib.no
>> Subject: Re: [Corpora-List] EmoText - Software for opinion mining and
>> lexical affect sensing
>>
>> Michal and Alexander,
>>
>>
>>
>> I thoroughly agree with Michal (and Graham) that these kinds of demo are a
>> good thing, and despite my - ongoing - criticisms, I'd like to take my hat
>> off to Alexander for sharing this work. There are already countless papers
>> describing technical approaches to this-and-that, and showing
>> impressive-looking results achieved upon [perhaps sometimes carefully
>> selected or tuned-to] test datasets. But I suspect that there's presently
>> no
>> better way to get a feel for where the state-of-the-art really is (and to
>> shed some qualitative light on matters) than by complementing these works
>> with some inquisitive and unrestrained hands-on tinkering.
>>
>>
>>
>> I tried a couple of reviews from Amazon. Among different feature sets from
>> 1 to 6, always one is close to the amazon's ranking, but unfortunately its
>> never one feature set in particular, but rather randomly one from the six.
>> Besides the closest method, all other are usually reversed (e.g., if the
>> closest method gives 5 star, all other give 1). However, this might have
>> just happen for those couple examples I tried (Reviews of Kindle on
>> Amazon).
>>
>>
>>
>> Isn't that more-or-less what one would expect from random output?
>>
>>
>>
>> Social aspects. You have to consider that the reviews from Amazon are
>> composed by different authors that have their own style of writing.
>> Moreover, you have to consider different cultural background, for example,
>> Americans and Englishmen use different words to express same things.
>> Goethe
>> used other words than a truck driver does.
>>
>>
>>
>> As a human, and an Englishman, I expect I can understand and fairly judge
>> the sentiment of most reviews written by, say, an American truck driver,
>> without undue reprogramming. Is this really an unrealistic goal for our
>> algorithms? And I wonder, is mastering a highly restricted style or
>> register
>> a necessary step in that direction... or is it in fact a detour.
>>
>>
>>
>> How can a classifier calculate a weight of a lexical feature if this
>> lexical feature is not present in the analyzed text?
>>
>>
>>
>> By inferring from similarities between that feature and those that *are*
>> present (e.g. through semi-supervised learning/bootstrapping of
>> unannotated
>> data)? That's at least one method about which a fair amount has been
>> written
>> already. I'm not saying its a solved problem mind you, but perhaps you're
>> not up against a brick wall yet?
>>
>>
>>
>> Justin Washtell
>> University of Leeds
>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>
>
> ______________________________**_________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111221/f9b9fabf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list