Corpora: Negative mutual information?
Philip Resnik
resnik at umiacs.umd.edu
Thu Mar 8 18:55:34 UTC 2001
> I have a question about calculating mutual information for bigrams
> in text. According to every definition I've seen of MI, the values
> are non-negative. However, I've found that for some bigrams made
> of common words in very uncommon bigrams, the value is less than
> zero. Does anyone know how to interpret a negative mutual
> information?
Where have you seen a definition suggesting (pointwise) MI must be
non-negative? The definition is based on a comparision between the
observed co-occurrence probability for the two words (i.e. the joint
probability P(x,y)), compared with the co-occurrence probability one
would expect to see if the two words were independent (i.e. the
product of the marginal probabilities P(x) and P(y)); namely
I(x,y) = log [ P(x,y) / P(x)P(y) ]
If the two words occur together *exactly* as frequently as one would
expect by chance, the ratio inside the log is equal to 1, giving us
I(x,y) = 0; if they occur more frequently than one would expect by
chance, the ratio is greater than 1 so I(x,y) > 0; and conversely if
they occur less frequently than one would expect by chance, the ratio
is less than 1 so I(x,y) < 0.
Nothing in principle or in practice prevents this last case, and the
interpretation is that the two words are for some reason dissociated
rather than associated, e.g. for linguistic reasons. For example,
"he" and "write" are probably both quite frequent unigrams, but the
bigram "he write" is highly unlikely because it violates number
agreement between the subject and the object. Hence one would predict
I(he,write) < 0.
That said, note that the *average* mutual information between two
random variables X and Y is defined as the relative entropy
D( P(x,y) || P(x)P(y) ) between the joint and the independence
distributions. Like any relative entropy, that value is indeed
guaranteed to be non-negative; e.g. see Cover, T. M. and Thomas,
J. A. (1991), Elements of Information Theory, Wiley, New York. The
term "mutual information" is sometimes used to refer to the
information-theoretic quantity of average mutual information, and
sometimes used to refer to pointwise mutual information, which is a
potential source of confusion.
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik at umiacs.umd.edu
More information about the Corpora
mailing list