[Corpora-List] rated words

Ute Römer ute.roemer at anglistik.uni-hannover.de
Sun Apr 17 15:55:25 UTC 2005


Dear all, 

Here is the article Peet has just mentioned (available at
http://www.newscientist.com/article.ns?id=dn7210&feedId=online-news_rss20).
This Sentiment software sounds interesting --- though it's perhaps not
'safe' for linguistic research. 

Best wishes... Ute

__________________________________

Software agents give out PR advice
10:00 02 April 2005 
Exclusive from New Scientist Print Edition 
Duncan Graham-Rowe 
Governments and big business like to indulge in media spin, and that means
knowing what is being said about them. But finding out is becoming ever more
difficult, with thousands of news outlets, websites and blogs to monitor.

Now a British company is about to launch a software program that can
automatically gauge the tone of any electronic document. It can tell whether
a newspaper article is reporting a political party’s policy in a positive or
negative light, for instance, or whether an online review is praising a
product or damning it. Welcome to the automation of PR. 

Till now, discovering whether the coverage you are getting is good or bad,
negative or neutral has usually meant hiring a “reputation management” firm.
Teams of people employed by the company will read through everything written
about a chosen organisation, person, event or issue and report back on how
favourable it is. 

As well as being expensive, this can be a long, slow process, says Nick
Jacobi, director of research for the Corpora Software company in Surrey, UK.
“There’s a massive information overload.” A single news agency may churn out
more than eight articles each hour. That is almost 200 stories a day per
news outlet.

Machine learning
Previous attempts to automate this kind of analysis have used one of two
techniques. In the first, called machine learning, a program is trained by
being given thousands of articles already determined by a human reader to be
positive or negative in tone. 


But learning in this way can lead to mistakes. For example, if a series of
the training articles mentions bomb attacks on a mosque in Iraq, the program
may incorrectly conclude that all other mentions of mosques are negative
too.

The alternative is the lexicon approach, in which certain words are
classified as either positive or negative. But plenty of words can be both.
“The plot was unpredictable” and “the steering was unpredictable” differ by
just one word. Yet the word “unpredictable” has a positive connotation in
the first example and a negative meaning in the second. 

And even if that problem is solved, just picking up on positive or negative
words can also lead to mistakes, as is demonstrated by the sentence:
“Everyone told me it was terrible, that I would hate it, but in the end it
wasn’t at all bad”.

So Corpora has come up with a program called Sentiment, which uses
algorithms to tease out grammatical components, such as nouns, verbs and
adjectives, and identify the subjects and objects of verbs. It can even
analyse pronouns like “it”, “he” and “her” to work out what words or
concepts they are referring to.

Having an understanding of grammatical structure makes it possible to filter
out words that are not relevant to the sentiment of the article, Jacobi
says. So instead of assuming certain words, such as “unpredictable” or
“rubbish”, are positive or negative it allows the structural context to
disambiguate them.

Expert readers
It does not get it right all the time, Jacobi admits, but then neither do
people. Three expert readers are likely to agree about an article 85% of the
time, and about 90% of non-experts will agree with this consensus. The
Sentiment software agrees with the same expert consensus about 80% of the
time.

Sentiment was developed principally for Infonic, one of Corpora Software’s
subsidiary companies, which provides clients with online media analysis of
websites, chat rooms, bulletin boards and blogs. The company also hopes to
use it to analyse the news for its clients.

Sentiment will not take the humans out of the equation, says Orlando Plunket
Greene of Infonic, because someone is still going to have to evaluate the
software’s report on each article. But because the program will list items
in terms of how positive, negative or neutral they are it is possible to
skip to the most relevant items. 

“It will allow us to prioritise, and do the job much faster,” he says. While
a person might be able to scan 10 articles an hour, Sentiment can zip
through 10 a second.

What makes this kind of analysis so challenging is that key words in a text
often offer no clues as to what sentiment they carry. Some of the toughest
challenges to comprehension, such as identifying irony and rhetoric, are
likely to remain unsolved for some time.

Related Articles
Software learns to translate by reading up 
http://www.newscientist.com/article.ns?id=dn7054 
22 February 2005 
Machine learns games 'like a human' 
http://www.newscientist.com/article.ns?id=dn6914 
24 January 2005 
Voicemail software recognises callers' emotions 
http://www.newscientist.com/article.ns?id=dn6845 
11 January 2005 
Weblinks
Corpora Software 
http://www.corporasoftware.com/ 
Infonic 
http://www.infonic.com/cgi/index.php4 


> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of peetm
> Sent: Sunday, April 17, 2005 4:40 PM
> To: 'Stephan Gillmeier'; corpora at uib.no
> Subject: RE: [Corpora-List] rated words
> 
> There's an interesting and semi-related article about this kind of thing
> in
> the 2nd April New Scientist: Software agents give out PR advice.
> 
> If you've a subscription, you can find the full-text on New Scientist's
> website of course.
> 
> The aricle mentions a company Corpora Software
> (http://www.corporasoftware.com/Sentiment.htm)
> 
> peetm
> 



More information about the Corpora mailing list