[Corpora-List] EmoText - Software for opinion mining and lexical affect sensing

Graham White graham at eecs.qmul.ac.uk
Wed Dec 21 15:40:28 UTC 2011


It's also struck me, after the discussion about how hard it is to 
extract sentiment from Twitter, that lots of people use Twitter to 
display how witty and intelligent they are in only 140 characters 
(displaying your intelligence in a hard medium is more effective than 
merely displaying your intelligence), and so, under the circumstances, 
you probably get a lot of irony, sarcasm, double meaning, and 
indirections of all sorts. These sorts of rhetorical devices are, of 
course, very hard to do machine learning on.

Graham

On 21/12/11 15:29, Wladimir Sidorenko wrote:
> Hello all,
>
> And greetings from another semi-lurker. I just would like to put in my
> two cents to the discussion. To my mind a statistical approach could
> only limited be helpful to a well performing and more or less precise
> Sentiment Analysis system. I can imagine that statistics would function
> well on deciding whether a particular sentence contains a sentiment or
> not, provided that we have enough data for training our classifier. But
> for more fine-grained tasks (as deciding whether a sentiment is positive
> or negative) I'd rather go for a linguistic-oriented approach (with
> basic grammatical analysis - like shallow parsing - and some pattern
> matching techniques). Of course that's nothing interesting from the
> scientific point of view, but in my opinion it's at least a good option
> in terms of system design.
>
> Kind regards,
> Vladimir Sidorenko
>
> 2011/12/21 Taras <taras8055 at gmail.com <mailto:taras8055 at gmail.com>>
>
>     Right. You can also add temporal dependency: a trained classifier
>     may become obsolete in several weeks/months. Rules may be less
>     dependent but do not expect high recall.
>     But the companies want it 'made once, used forever and applied to
>     everything'.
>
>     Taras
>
>
>     On 21/12/11 14:23, iain wrote:
>
>         My own suspicion in terms of what Taras was saying is that the
>         expense is in
>         making the classifier suitable for a domain or genre.  It is
>         quite clear
>         that something which can decode tweets may not work too well on
>         blogs or
>         forums like this one!
>
>         Equally, a general classifier won't be as effective on a
>         sub-language as the
>         one it's trained on.  Scientific vitriol can be couched in terms
>         which only
>         the very adept outsider will recognise!
>
>
>         Iain
>         -----Original Message-----
>         From: corpora-bounces at uib.no <mailto:corpora-bounces at uib.no>
>         [mailto:corpora-bounces at uib.no
>         <mailto:corpora-bounces at uib.no>__] On Behalf Of
>         Michal Ptaszynski
>         Sent: 21 December 2011 13:35
>         To: corpora at uib.no <mailto:corpora at uib.no>;
>         corpora-request at uib.no <mailto:corpora-request at uib.no>
>         Cc: ptaszynski at hgu.jp <mailto:ptaszynski at hgu.jp>
>         Subject: Re: [Corpora-List] EmoText - Software for opinion
>         mining and
>         lexical affect sensing
>
>         Dear Taras, Iain
>         Dear All,
>
>         What Iain and taras say is one of the best things I've heard
>         lately, mostly,
>         because it confirms my findings too. However, your experience is
>         probably
>         based more on real world examples.
>         If you could provide a proof of some kind, or a description of some
>         examples, this would be a very useful hint.
>
>         Just a word on "keeping classification cheap". I think this is
>         not as much
>         about the money, as it is about logic (and trying to find it).
>         For example, it is not much of a research to just have, for
>         example, 100
>         people write a lot of rules. Even if a system cerated this way
>         would achieve
>         high performance its not too interesting from the scientific
>         point of view.
>
>         What we, researchers, try to find is a kind of logical reasoning
>         that could
>         be represented computationally. So, for example, if Mr.X has a
>         1000-rule-system that gives him 85% accuracy, and Mr.Y has a
>         10-(general)-rule-system that gives him 82% accuracy, a
>         researcher would
>         rather be first interested in Mr.Y's system.
>
>         I think this applies to all fields that have their commercial
>         variations.
>         For example, each year there is a number of papers on machine
>         translation
>         presenting high results, but the level of actual machine translation
>         software available on the market is rather low (As a former
>         translator I
>         tried about 5 different ones).
>
>         Best,
>
>         Michal
>
>
>         ---------------------
>         Od: Taras<taras8055 at gmail.com <mailto:taras8055 at gmail.com>>
>         Do: corpora at uib.no <mailto:corpora at uib.no>
>         Data: Wed, 21 Dec 2011 10:43:47 +0000
>         Temat: Re: [Corpora-List] EmoText - Software for opinion mining and
>         lexical affect sensing
>
>
>         Hi
>         I am a developer of one of commercial tools. And I think there
>         are two
>         majour problems that prevent them being more accurate:
>         1. They try to keep classification cheap. Cheap means generic.
>         But the
>         only way of getting a good sentiment accuracy is making classifiers
>         specific. But this is expensive.
>         2. The other problem is the neutrality bias. In most cases texts are
>         usually neutral or balanced, and it makes extraction of
>         non-neutrals very
>         difficult. The problem actually is not subjectivity or sentiment
>         classification taken separately, but the combination of the two.
>
>         Of course there are other problems: noisy language, various ways of
>         expressing sentiment etc. But the two aforementioned are the most
>         business-specific ones.
>
>         Regards,
>
>         Taras Zagibalov
>
>         On 21/12/11 09:52, iain wrote:
>         I've been following this thread with interest.  I'm a commercial
>         semi-lurker
>         rather than an involved theorist, but my colleagues and I have
>         done some
>         work with some of the available commercial sentiment tools.
>
>         Our experience is that they are really not very accurate.
>
>         There are some issues with evaluating them.  I'm not using a
>         pre-marked
>         gold
>         standard to score them, but rather submitting text from web
>         pages to them
>         and comparing the output with the text, which makes our results
>         far from
>         scientific!  And we've done dozens not thousands.
>
>         What we tend to find is that we look at the output and then at
>         the text and
>         more often than not say, 'uh'.  Pretty much as the reviewers of
>         Alexander's
>         test site have been doing!  Which might make Alexander's work
>         close to the
>         commercial state of the art  :-J   ....
>
>         Some of the reviews of commercial tools I've seen seem to
>         indicate that if
>         you take the 'neutral' sentiment articles out then the actual
>         accuracy
>         drops
>         down from the claimed 70% quite considerably.  In short, the
>         tools are very
>         good at detecting no sentiment but rather poorer at getting actual
>         sentiment
>         right.
>
>         I was wondering if anyone on the list had experience with the
>         commercial
>         tools and what sort of results they found.  Could they recommend
>         one or
>         another of the suppliers?  I'd also be interested if any tool
>         suppliers
>         (also commercially semi-lurking  ) might have some input to this
>         - what is
>         their real expectations of quality?
>
>
>         Iain
>         -----Original Message-----
>         From: corpora-bounces at uib.no <mailto:corpora-bounces at uib.no>
>         [mailto:corpora-bounces at uib.no
>         <mailto:corpora-bounces at uib.no>__] On Behalf Of
>         Justin Washtell
>         Sent: 20 December 2011 20:32
>         To: Alexander Osherenko; ptaszynski at ieee.org
>         <mailto:ptaszynski at ieee.org>
>         Cc: corpora at uib.no <mailto:corpora at uib.no>;
>         corpora-request at uib.no <mailto:corpora-request at uib.no>
>         Subject: Re: [Corpora-List] EmoText - Software for opinion
>         mining and
>         lexical affect sensing
>
>         Michal and Alexander,
>
>
>
>         I thoroughly agree with Michal (and Graham) that these kinds of
>         demo are a
>         good thing, and despite my - ongoing - criticisms, I'd like to
>         take my hat
>         off to Alexander for sharing this work. There are already
>         countless papers
>         describing technical approaches to this-and-that, and showing
>         impressive-looking results achieved upon [perhaps sometimes
>         carefully
>         selected or tuned-to] test datasets. But I suspect that there's
>         presently
>         no
>         better way to get a feel for where the state-of-the-art really
>         is (and to
>         shed some qualitative light on matters) than by complementing
>         these works
>         with some inquisitive and unrestrained hands-on tinkering.
>
>
>
>         I tried a couple of reviews from Amazon. Among different feature
>         sets from
>         1 to 6, always one is close to the amazon's ranking, but
>         unfortunately its
>         never one feature set in particular, but rather randomly one
>         from the six.
>         Besides the closest method, all other are usually reversed
>         (e.g., if the
>         closest method gives 5 star, all other give 1). However, this
>         might have
>         just happen for those couple examples I tried (Reviews of Kindle on
>         Amazon).
>
>
>
>         Isn't that more-or-less what one would expect from random output?
>
>
>
>         Social aspects. You have to consider that the reviews from
>         Amazon are
>         composed by different authors that have their own style of writing.
>         Moreover, you have to consider different cultural background,
>         for example,
>         Americans and Englishmen use different words to express same
>         things. Goethe
>         used other words than a truck driver does.
>
>
>
>         As a human, and an Englishman, I expect I can understand and
>         fairly judge
>         the sentiment of most reviews written by, say, an American truck
>         driver,
>         without undue reprogramming. Is this really an unrealistic goal
>         for our
>         algorithms? And I wonder, is mastering a highly restricted style or
>         register
>         a necessary step in that direction... or is it in fact a detour.
>
>
>
>         How can a classifier calculate a weight of a lexical feature if this
>         lexical feature is not present in the analyzed text?
>
>
>
>         By inferring from similarities between that feature and those
>         that *are*
>         present (e.g. through semi-supervised learning/bootstrapping of
>         unannotated
>         data)? That's at least one method about which a fair amount has been
>         written
>         already. I'm not saying its a solved problem mind you, but
>         perhaps you're
>         not up against a brick wall yet?
>
>
>
>         Justin Washtell
>         University of Leeds
>
>         _________________________________________________
>         UNSUBSCRIBE from this page:
>         http://mailman.uib.no/options/__corpora
>         <http://mailman.uib.no/options/corpora>
>         Corpora mailing list
>         Corpora at uib.no <mailto:Corpora at uib.no>
>         http://mailman.uib.no/__listinfo/corpora
>         <http://mailman.uib.no/listinfo/corpora>
>
>
>         _________________________________________________
>         UNSUBSCRIBE from this page:
>         http://mailman.uib.no/options/__corpora
>         <http://mailman.uib.no/options/corpora>
>         Corpora mailing list
>         Corpora at uib.no <mailto:Corpora at uib.no>
>         http://mailman.uib.no/__listinfo/corpora
>         <http://mailman.uib.no/listinfo/corpora>
>
>
>         _________________________________________________
>         UNSUBSCRIBE from this page:
>         http://mailman.uib.no/options/__corpora
>         <http://mailman.uib.no/options/corpora>
>         Corpora mailing list
>         Corpora at uib.no <mailto:Corpora at uib.no>
>         http://mailman.uib.no/__listinfo/corpora
>         <http://mailman.uib.no/listinfo/corpora>
>
>
>         _________________________________________________
>         UNSUBSCRIBE from this page:
>         http://mailman.uib.no/options/__corpora
>         <http://mailman.uib.no/options/corpora>
>         Corpora mailing list
>         Corpora at uib.no <mailto:Corpora at uib.no>
>         http://mailman.uib.no/__listinfo/corpora
>         <http://mailman.uib.no/listinfo/corpora>
>
>
>
>     _________________________________________________
>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/__corpora
>     <http://mailman.uib.no/options/corpora>
>     Corpora mailing list
>     Corpora at uib.no <mailto:Corpora at uib.no>
>     http://mailman.uib.no/__listinfo/corpora
>     <http://mailman.uib.no/listinfo/corpora>
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list