Corpora: Negative mutual information?

Philip Resnik resnik at umiacs.umd.edu
Thu Mar 8 18:55:34 UTC 2001


>    I have a question about calculating mutual information for bigrams
>    in text.  According to every definition I've seen of MI, the values
>    are non-negative.  However, I've found that for some bigrams made
>    of common words in very uncommon bigrams, the value is less than
>    zero.  Does anyone know how to interpret a negative mutual
>    information?

Where have you seen a definition suggesting (pointwise) MI must be
non-negative?  The definition is based on a comparision between the
observed co-occurrence probability for the two words (i.e. the joint
probability P(x,y)), compared with the co-occurrence probability one
would expect to see if the two words were independent (i.e. the
product of the marginal probabilities P(x) and P(y)); namely

  I(x,y) = log [ P(x,y) / P(x)P(y) ]

If the two words occur together *exactly* as frequently as one would
expect by chance, the ratio inside the log is equal to 1, giving us
I(x,y) = 0; if they occur more frequently than one would expect by
chance, the ratio is greater than 1 so I(x,y) > 0; and conversely if
they occur less frequently than one would expect by chance, the ratio
is less than 1 so I(x,y) < 0.

Nothing in principle or in practice prevents this last case, and the
interpretation is that the two words are for some reason dissociated
rather than associated, e.g. for linguistic reasons.  For example,
"he" and "write" are probably both quite frequent unigrams, but the
bigram "he write" is highly unlikely because it violates number
agreement between the subject and the object.  Hence one would predict
I(he,write) < 0.

That said, note that the *average* mutual information between two
random variables X and Y is defined as the relative entropy
D( P(x,y) || P(x)P(y) ) between the joint and the independence
distributions.  Like any relative entropy, that value is indeed
guaranteed to be non-negative; e.g. see Cover, T. M. and Thomas,
J. A. (1991), Elements of Information Theory, Wiley, New York.  The
term "mutual information" is sometimes used to refer to the
information-theoretic quantity of average mutual information, and
sometimes used to refer to pointwise mutual information, which is a
potential source of confusion.

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list