[Corpora-List] Distribution of mutual information?
J Washtell
lec3jrw at leeds.ac.uk
Sun Mar 15 20:21:43 UTC 2009
Dear Linas,
> Possibly because I described it incorrectly. This is *not* a scatterplot
> of MI(x,y) vs. P(x,y) , as I may have suggested earlier. This is a
> distribution of MI(x,y) -- i.e. a graph of the likelihood of observing a
> particular value of mutual information.
This is indeed what I had interpreted your plot as: a probability
distribution over MI.
> My question was at least partly a statistics question: flipping
> through textbooks, I simply can't find a distribution with log-linear
> slopes.
The zipf distribution has a single log-linear slope. You are combining
more than one such distribution here via MI, so it is to be expected
that the result is a quasi-symmetrical distribution with log-linear
slopes.
Please compare the attached plot to yours. It is a probability
distribution over log(p(x1)p(y1)/p(x2)p(y2)), where x and y exhibit
approximate zipf distributions. In other words, it is comparable to
calculating MI upon a random corpus which has no associative structure.
You will notice that it is strikingly similar to yours, although of
course lacking the rightward skew due to association and also the
interesting kinks you obsevere (suggesting that these were not to do
with my first-guess reason).
I shall be very interested to know out what these kinks are. Please do
report when you crack it!
Best regards,
Justin Washtell
University of Leeds
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FauxMI.jpg
Type: image/jpeg
Size: 20815 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090315/c40a2af5/attachment-0001.jpg>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list