Hello all,<br><br>And greetings from another semi-lurker. I just would like to put in my two cents to the discussion. To my mind a statistical approach could only limited be helpful to a well performing and more or less precise Sentiment Analysis system. I can imagine that statistics would function well on deciding whether a particular sentence contains a sentiment or not, provided that we have enough data for training our classifier. But for more fine-grained tasks (as deciding whether a sentiment is positive or negative) I'd rather go for a linguistic-oriented approach (with basic grammatical analysis - like shallow parsing - and some pattern matching techniques). Of course that's nothing interesting from the scientific point of view, but in my opinion it's at least a good option in terms of system design.<br>


<br>Kind regards,<br>Vladimir Sidorenko<br><br><div class="gmail_quote">2011/12/21 Taras <span dir="ltr"><<a href="mailto:taras8055@gmail.com">taras8055@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Right. You can also add temporal dependency: a trained classifier may become obsolete in several weeks/months. Rules may be less dependent but do not expect high recall.<br>

But the companies want it 'made once, used forever and applied to everything'.<span class="HOEnZb"><font color="#888888"><br>

<br>

Taras</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 21/12/11 14:23, iain wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My own suspicion in terms of what Taras was saying is that the expense is in<br>

making the classifier suitable for a domain or genre.  It is quite clear<br>

that something which can decode tweets may not work too well on blogs or<br>

forums like this one!<br>

<br>

Equally, a general classifier won't be as effective on a sub-language as the<br>

one it's trained on.  Scientific vitriol can be couched in terms which only<br>

the very adept outsider will recognise!<br>

<br>

<br>

Iain<br>

-----Original Message-----<br>

From: <a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a> [mailto:<a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a><u></u>] On Behalf Of<br>

Michal Ptaszynski<br>

Sent: 21 December 2011 13:35<br>

To: <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a>; <a href="mailto:corpora-request@uib.no" target="_blank">corpora-request@uib.no</a><br>

Cc: <a href="mailto:ptaszynski@hgu.jp" target="_blank">ptaszynski@hgu.jp</a><br>

Subject: Re: [Corpora-List] EmoText - Software for opinion mining and<br>

lexical affect sensing<br>

<br>

Dear Taras, Iain<br>

Dear All,<br>

<br>

What Iain and taras say is one of the best things I've heard lately, mostly,<br>

because it confirms my findings too. However, your experience is probably<br>

based more on real world examples.<br>

If you could provide a proof of some kind, or a description of some<br>

examples, this would be a very useful hint.<br>

<br>

Just a word on "keeping classification cheap". I think this is not as much<br>

about the money, as it is about logic (and trying to find it).<br>

For example, it is not much of a research to just have, for example, 100<br>

people write a lot of rules. Even if a system cerated this way would achieve<br>

high performance its not too interesting from the scientific point of view.<br>

<br>

What we, researchers, try to find is a kind of logical reasoning that could<br>

be represented computationally. So, for example, if Mr.X has a<br>

1000-rule-system that gives him 85% accuracy, and Mr.Y has a<br>

10-(general)-rule-system that gives him 82% accuracy, a researcher would<br>

rather be first interested in Mr.Y's system.<br>

<br>

I think this applies to all fields that have their commercial variations.<br>

For example, each year there is a number of papers on machine translation<br>

presenting high results, but the level of actual machine translation<br>

software available on the market is rather low (As a former translator I<br>

tried about 5 different ones).<br>

<br>

Best,<br>

<br>

Michal<br>

<br>

<br>

---------------------<br>

Od: Taras<<a href="mailto:taras8055@gmail.com" target="_blank">taras8055@gmail.com</a>><br>

Do: <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a><br>

Data: Wed, 21 Dec 2011 10:43:47 +0000<br>

Temat: Re: [Corpora-List] EmoText - Software for opinion mining and<br>

lexical affect sensing<br>

<br>

<br>

Hi<br>

I am a developer of one of commercial tools. And I think there are two<br>

majour problems that prevent them being more accurate:<br>

1. They try to keep classification cheap. Cheap means generic. But the<br>

only way of getting a good sentiment accuracy is making classifiers<br>

specific. But this is expensive.<br>

2. The other problem is the neutrality bias. In most cases texts are<br>

usually neutral or balanced, and it makes extraction of non-neutrals very<br>

difficult. The problem actually is not subjectivity or sentiment<br>

classification taken separately, but the combination of the two.<br>

<br>

Of course there are other problems: noisy language, various ways of<br>

expressing sentiment etc. But the two aforementioned are the most<br>

business-specific ones.<br>

<br>

Regards,<br>

<br>

Taras Zagibalov<br>

<br>

On 21/12/11 09:52, iain wrote:<br>

I've been following this thread with interest.  I'm a commercial<br>

semi-lurker<br>

rather than an involved theorist, but my colleagues and I have done some<br>

work with some of the available commercial sentiment tools.<br>

<br>

Our experience is that they are really not very accurate.<br>

<br>

There are some issues with evaluating them.  I'm not using a pre-marked<br>

gold<br>

standard to score them, but rather submitting text from web pages to them<br>

and comparing the output with the text, which makes our results far from<br>

scientific!  And we've done dozens not thousands.<br>

<br>

What we tend to find is that we look at the output and then at the text and<br>

more often than not say, 'uh'.  Pretty much as the reviewers of Alexander's<br>

test site have been doing!  Which might make Alexander's work close to the<br>

commercial state of the art  :-J   ....<br>

<br>

Some of the reviews of commercial tools I've seen seem to indicate that if<br>

you take the 'neutral' sentiment articles out then the actual accuracy<br>

drops<br>

down from the claimed 70% quite considerably.  In short, the tools are very<br>

good at detecting no sentiment but rather poorer at getting actual<br>

sentiment<br>

right.<br>

<br>

I was wondering if anyone on the list had experience with the commercial<br>

tools and what sort of results they found.  Could they recommend one or<br>

another of the suppliers?  I'd also be interested if any tool suppliers<br>

(also commercially semi-lurking  ) might have some input to this - what is<br>

their real expectations of quality?<br>

<br>

<br>

Iain<br>

-----Original Message-----<br>

From: <a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a> [mailto:<a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a><u></u>] On Behalf Of<br>

Justin Washtell<br>

Sent: 20 December 2011 20:32<br>

To: Alexander Osherenko; <a href="mailto:ptaszynski@ieee.org" target="_blank">ptaszynski@ieee.org</a><br>

Cc: <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a>; <a href="mailto:corpora-request@uib.no" target="_blank">corpora-request@uib.no</a><br>

Subject: Re: [Corpora-List] EmoText - Software for opinion mining and<br>

lexical affect sensing<br>

<br>

Michal and Alexander,<br>

<br>

<br>

<br>

I thoroughly agree with Michal (and Graham) that these kinds of demo are a<br>

good thing, and despite my - ongoing - criticisms, I'd like to take my hat<br>

off to Alexander for sharing this work. There are already countless papers<br>

describing technical approaches to this-and-that, and showing<br>

impressive-looking results achieved upon [perhaps sometimes carefully<br>

selected or tuned-to] test datasets. But I suspect that there's presently<br>

no<br>

better way to get a feel for where the state-of-the-art really is (and to<br>

shed some qualitative light on matters) than by complementing these works<br>

with some inquisitive and unrestrained hands-on tinkering.<br>

<br>

<br>

<br>

I tried a couple of reviews from Amazon. Among different feature sets from<br>

1 to 6, always one is close to the amazon's ranking, but unfortunately its<br>

never one feature set in particular, but rather randomly one from the six.<br>

Besides the closest method, all other are usually reversed (e.g., if the<br>

closest method gives 5 star, all other give 1). However, this might have<br>

just happen for those couple examples I tried (Reviews of Kindle on<br>

Amazon).<br>

<br>

<br>

<br>

Isn't that more-or-less what one would expect from random output?<br>

<br>

<br>

<br>

Social aspects. You have to consider that the reviews from Amazon are<br>

composed by different authors that have their own style of writing.<br>

Moreover, you have to consider different cultural background, for example,<br>

Americans and Englishmen use different words to express same things. Goethe<br>

used other words than a truck driver does.<br>

<br>

<br>

<br>

As a human, and an Englishman, I expect I can understand and fairly judge<br>

the sentiment of most reviews written by, say, an American truck driver,<br>

without undue reprogramming. Is this really an unrealistic goal for our<br>

algorithms? And I wonder, is mastering a highly restricted style or<br>

register<br>

a necessary step in that direction... or is it in fact a detour.<br>

<br>

<br>

<br>

How can a classifier calculate a weight of a lexical feature if this<br>

lexical feature is not present in the analyzed text?<br>

<br>

<br>

<br>

By inferring from similarities between that feature and those that *are*<br>

present (e.g. through semi-supervised learning/bootstrapping of unannotated<br>

data)? That's at least one method about which a fair amount has been<br>

written<br>

already. I'm not saying its a solved problem mind you, but perhaps you're<br>

not up against a brick wall yet?<br>

<br>

<br>

<br>

Justin Washtell<br>

University of Leeds<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

</blockquote>

<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

</div></div></blockquote></div><br>