[Corpora-List] Distribution of mutual information?

J Washtell lec3jrw at leeds.ac.uk
Sun Mar 15 20:21:43 UTC 2009


Dear Linas,

> Possibly because I described it incorrectly. This is *not* a scatterplot
> of MI(x,y) vs. P(x,y) , as I may have suggested earlier.  This is a
> distribution  of MI(x,y) -- i.e. a graph of the likelihood of observing a
> particular value of mutual information.

This is indeed what I had interpreted your plot as: a probability  
distribution over MI.

> My question was at least partly a statistics question: flipping
> through textbooks, I simply can't find a distribution with log-linear
> slopes.

The zipf distribution has a single log-linear slope. You are combining  
more than one such distribution here via MI, so it is to be expected  
that the result is a quasi-symmetrical distribution with log-linear  
slopes.

Please compare the attached plot to yours. It is a probability  
distribution over log(p(x1)p(y1)/p(x2)p(y2)), where x and y exhibit  
approximate zipf distributions. In other words, it is comparable to  
calculating MI upon a random corpus which has no associative structure.

You will notice that it is strikingly similar to yours, although of  
course lacking the rightward skew due to association and also the  
interesting kinks you obsevere (suggesting that these were not to do  
with my first-guess reason).

I shall be very interested to know out what these kinks are. Please do  
report when you crack it!

Best regards,

Justin Washtell
University of Leeds
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FauxMI.jpg
Type: image/jpeg
Size: 20815 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090315/c40a2af5/attachment-0001.jpg>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list