[Corpora-List] Fwd: Re: Distribution of mutual information?

J Washtell lec3jrw at leeds.ac.uk
Mon Mar 16 00:00:54 UTC 2009


For those interested in Linas' MI plot, John Goldsmith pointed out  
(below) that my previous post and plot on this topic was somewhat  
confusing and might benefit from clarification - particularly with  
regards to the significance of log(p(x1,y1)/p(x2/y2)). Upon  
re-reading, I concur - this is very cryptic :-)

The plot is intended to illustrate that the source of the shape of   
Linas' distribution is simply the independent zipf distibutions of the  
word tokens, and the general structure of MI (i.e. log(a/b)).

So, specifically, Mutual information is defined as:

log ( p(x,y) / p(x)p(y) )

Which Linas showed produced an interesting distribution; interesting  
in the first instance due to the fact that while not being immediately  
recognizable as any "classic" distibution, it is nonetheless very  
simple geometrically speaking.

One could go to the trouble of plotting log ( p(x,y) / p(x)p(y) ) for  
a *randomly* generated corpus (where the token frequencies observe  
Zipfs law, but there is no associative structure), to convince oneself  
that the shape does not arise from any linguistic phenonemon. However,  
besides being a little bit of trouble to produce, this might  
nonetheless cloud the root cause of the observed distribution by  
suggesting that it has something to do with the distribution of  
observed co-occurrences (the joint probability), even in a random  
corpus - which it does not.

Rather, the pertinent fact is that the numerator and the denominator  
each comprise of variables having Zipf distributions. Plotting log(  
p(x1)p(y1)  /p(x2)(y2) ) -- x1,y1,x2 & y2 all being independent  
zipf-distributed variables -- which does not take any co-occurrence  
into account yet still produces this same shape, is a good way to  
illustrate this... and was very easy to mock up with a few tens of  
thousands of randomly generated numbers in a spreadsheed.

This formula is comparable to MI insomuch as it is the ratio of the  
distribution p(x)p(y) - where p(x) and p(y) are zipf distributed - to  
one which is very very similar, and insomuch as that it produces the  
same characteristic symmetric log-linear shape.

The implication therefore is that the *linguistically* pertinient  
features of Linas' distribution are manifest in its deviation from  
this shape: A) the rightward skew due to p(x,y) capturing actual  
associative structure in the language, B) as Linas observes (and I  
agree) is much more interesting, the pronounced kinks.

Best regards,

Justin Washtell
University of Leeds

>
>>
>> Please compare the attached plot to yours. It is a probability    
>> distribution over log(p(x1)p(y1)/p(x2)p(y2)), where x and y exhibit  
>>   approximate zipf distributions. In other words, it is comparable   
>> to  calculating MI upon a random corpus which has no associative    
>> structure.
> Could you post something in which you explain a bit more what you did
> (and perhaps even why)? Since MI specifically compares joint to
> marginal probabilities, I'm having trouble seeing why your expression
> is comparable to MI.
> thanks,
> John Goldsmith




----- End forwarded message -----


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list