[Corpora-List] MI for more than 2 items

Lushan Han lushan1 at umbc.edu
Wed May 15 19:57:21 UTC 2013


Then, is there any solid, mathematically-based association measure between
three or more variables?

Thanks,

Lushan Han


On Wed, May 15, 2013 at 2:53 PM, Chris Brew <christopher.brew at gmail.com>wrote:

> The mutual information score that lexicographers use is a close relative
> of the mathematical notion of mutual information between two random
> variables. Peter Turney and others have been careful to reflect this
> distinction by using the term 'pointwise mutual information' (PMI)  for the
> lexicographer's version and MI for the other.  Technically, MI is the sum
> over all cells of a two dimensional matrix of the PMI. This means that you
> can begin to think of PMI as something like "the contribution of a
> particular pair of words to MI". And lexicographers have had fair success
> interpreting it this way. The mathematicians tend to look askance at PMI,
> because of concerns like "the PMI for a pair of words can in principle be
> negative even when the MI summed over all words is positive. What (the
> hell) does that mean?"
>
> MI is a central notion of information theory, and backed by many useful
> mathematical results. For the task of measuring word association, the
> mathematical advantages
> of MI do not really translate into a preference for using PMI rather than
> some other measure of association. If it works for you, that's OK. You
> don't get much extra from the connection to the mathematics.
>
> Once you move to three or more terms, things get even more complex. The
> generalizations of MI to three or more terms are confusing in themselves,
> just because interactions between three or more variables are much more
> complicated than interactions between just two. The generalizations of PMI
> would be at least as messy, possibly worse, so it is no surprise that
> mathematical support for such generalizations is missing.
>
>
>
>
>
> On Tue, May 14, 2013 at 10:14 AM, Mike Scott <mike at lexically.net> wrote:
>
>>  I have had a query about MI (or any other similar statistic) involving
>> more than two elements:
>>
>> "I don't know how to calculate the Mutual Information (MI) for these
>> 4-word lexical bundles, it seems I can only find the MI score for 2-word
>> collocations."
>>
>> Can anyone advise please?
>>
>> Cheers -- Mike
>>
>> --
>> Mike Scott
>>
>> ***
>> If you publish research which uses WordSmith, do let me know so I can include it athttp://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
>> ***
>> University of Aston and Lexical Analysis Software Ltd.mike.scott at aston.ac.ukwww.lexically.net
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> Chris Brew
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130515/dba226b4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list