[Corpora-List] MI for more than 2 items
Chris Brew
christopher.brew at gmail.com
Wed May 15 18:53:49 UTC 2013
The mutual information score that lexicographers use is a close relative of
the mathematical notion of mutual information between two random variables.
Peter Turney and others have been careful to reflect this distinction by
using the term 'pointwise mutual information' (PMI) for the
lexicographer's version and MI for the other. Technically, MI is the sum
over all cells of a two dimensional matrix of the PMI. This means that you
can begin to think of PMI as something like "the contribution of a
particular pair of words to MI". And lexicographers have had fair success
interpreting it this way. The mathematicians tend to look askance at PMI,
because of concerns like "the PMI for a pair of words can in principle be
negative even when the MI summed over all words is positive. What (the
hell) does that mean?"
MI is a central notion of information theory, and backed by many useful
mathematical results. For the task of measuring word association, the
mathematical advantages
of MI do not really translate into a preference for using PMI rather than
some other measure of association. If it works for you, that's OK. You
don't get much extra from the connection to the mathematics.
Once you move to three or more terms, things get even more complex. The
generalizations of MI to three or more terms are confusing in themselves,
just because interactions between three or more variables are much more
complicated than interactions between just two. The generalizations of PMI
would be at least as messy, possibly worse, so it is no surprise that
mathematical support for such generalizations is missing.
On Tue, May 14, 2013 at 10:14 AM, Mike Scott <mike at lexically.net> wrote:
> I have had a query about MI (or any other similar statistic) involving
> more than two elements:
>
> "I don't know how to calculate the Mutual Information (MI) for these
> 4-word lexical bundles, it seems I can only find the MI score for 2-word
> collocations."
>
> Can anyone advise please?
>
> Cheers -- Mike
>
> --
> Mike Scott
>
> ***
> If you publish research which uses WordSmith, do let me know so I can include it athttp://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
> ***
> University of Aston and Lexical Analysis Software Ltd.mike.scott at aston.ac.ukwww.lexically.net
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
--
Chris Brew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130515/cb24d305/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list