Corpora: Negative mutual information?

Ted Pedersen ted_pedersen at hotmail.com
Thu Mar 8 15:49:00 UTC 2001


Hi David,

I'm guessing that what you are looking at are pointwise
Mutual Information values, usually defined along these
lines for a bigram 'word1 word2':

log [freq(word1,word2)*N/freq(word1)*freq(word2)]

where N is the number of bigrams in your sample.

This will go negative when

freq(word1,word2)*N < freq(word1)*freq(word2)

or

N < freq(word1)*freq(word2)/freq(word1,word2)

So what does a negative value tell us? Well, it suggests
that word1 and/or word2 must be very high frequency words
(the, and, a ... come to mind) that don't occur together
in the bigram under consideration especially often.

You can also look at the relationship

freq(word1,word2) < freq(word1)*freq(word2)/N

The right hand side of this inequality is the expected value
for the frequency count of the bigram 'word1 word2' under
the classical assumption of independence (which underlies
tests like Pearson's and the loglikelihood ratio). So a
negative pointwise mutual information value tells us that
observed frequency count for a bigram is less than we would
expect under the assumption that the words in the bigram
are independent.

I have puzzled a bit over this notion of being 'less than
what would be expected under independence'. Does this just
mean that the words in the bigram are independent, or is
something further suggested? I'd be interested if anyone else
has some thoughts on that particular issue...

Anyways, I'm not sure how good a tool pointwise Mutual Information
is anyway (see the Manning and Schutze text, for example, for
some reasons for concern) but it does raise some interesting
issues no doubt.

Regards,
Ted

---
Ted Pedersen
http://www.d.umn.edu/~tpederse
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com



More information about the Corpora mailing list