[Corpora-List] MI for more than 2 items

Chris Brew christopher.brew at gmail.com
Wed May 15 22:24:11 UTC 2013


I'm not aware of a solid three term measure of association. My guess is
that there isn't one, but that is a guess.
It is perfectly straightforward, if you have a two-term association
measure, to ask all the two term questions for A, B, C . That is :

1) are A and B associated?
2) are B and C associated?
3) are A and C associated?
4) treating AB as a single thing, are AB and C associated?
5) treating BC as a single thing, are BC and A associated?
6) treating AC as a single thing, are AC and B associated?

but its not obvious that it is reasonable to try to combine the 6 answers
into an overall figure that is a measure
of anything. What exactly is the question you want this figure to answer?






On Wed, May 15, 2013 at 12:57 PM, Lushan Han <lushan1 at umbc.edu> wrote:

> Then, is there any solid, mathematically-based association measure between
> three or more variables?
>
> Thanks,
>
> Lushan Han
>
>
> On Wed, May 15, 2013 at 2:53 PM, Chris Brew <christopher.brew at gmail.com>wrote:
>
>> The mutual information score that lexicographers use is a close relative
>> of the mathematical notion of mutual information between two random
>> variables. Peter Turney and others have been careful to reflect this
>> distinction by using the term 'pointwise mutual information' (PMI)  for the
>> lexicographer's version and MI for the other.  Technically, MI is the sum
>> over all cells of a two dimensional matrix of the PMI. This means that you
>> can begin to think of PMI as something like "the contribution of a
>> particular pair of words to MI". And lexicographers have had fair success
>> interpreting it this way. The mathematicians tend to look askance at PMI,
>> because of concerns like "the PMI for a pair of words can in principle be
>> negative even when the MI summed over all words is positive. What (the
>> hell) does that mean?"
>>
>> MI is a central notion of information theory, and backed by many useful
>> mathematical results. For the task of measuring word association, the
>> mathematical advantages
>> of MI do not really translate into a preference for using PMI rather than
>> some other measure of association. If it works for you, that's OK. You
>> don't get much extra from the connection to the mathematics.
>>
>> Once you move to three or more terms, things get even more complex. The
>> generalizations of MI to three or more terms are confusing in themselves,
>> just because interactions between three or more variables are much more
>> complicated than interactions between just two. The generalizations of PMI
>> would be at least as messy, possibly worse, so it is no surprise that
>> mathematical support for such generalizations is missing.
>>
>>
>>
>>
>>
>> On Tue, May 14, 2013 at 10:14 AM, Mike Scott <mike at lexically.net> wrote:
>>
>>>  I have had a query about MI (or any other similar statistic) involving
>>> more than two elements:
>>>
>>> "I don't know how to calculate the Mutual Information (MI) for these
>>> 4-word lexical bundles, it seems I can only find the MI score for 2-word
>>> collocations."
>>>
>>> Can anyone advise please?
>>>
>>> Cheers -- Mike
>>>
>>> --
>>> Mike Scott
>>>
>>> ***
>>> If you publish research which uses WordSmith, do let me know so I can include it athttp://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
>>> ***
>>> University of Aston and Lexical Analysis Software Ltd.mike.scott at aston.ac.ukwww.lexically.net
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>>
>>
>>
>> --
>> Chris Brew
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>


-- 
Chris Brew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130515/f087a187/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list