[Corpora-List] Distribution of mutual information?

Linas Vepstas linasvepstas at gmail.com
Mon Mar 16 03:28:27 UTC 2009


2009/3/15 J Washtell <lec3jrw at leeds.ac.uk>:
> Dear Linas,
>
>> distribution  of MI(x,y) -- i.e. a graph of the likelihood of observing a
>> particular value of mutual information.
>
> The zipf distribution has a single log-linear slope. You are combining more
> than one such distribution here via MI, so it is to be expected that the
> result is a quasi-symmetrical distribution with log-linear slopes.
>
> Please compare the attached plot to yours. It is a probability distribution
> over log(p(x1)p(y1)/p(x2)p(y2)), where x and y exhibit approximate zipf
> distributions. In other words, it is comparable to calculating MI upon a
> random corpus which has no associative structure.
>
> You will notice that it is strikingly similar to yours, although of course
> lacking the rightward skew due to association and also the interesting kinks
> you obsevere

Will, I've taken a more circuitous route, but not yet arrived
at your results. At the moment, I've been trying to generate
random texts. It appears that my random texts are not
Zipfian enough, they're rather stair-steppy; and this produces
MI graphs  with  Gaussian fall-offs to the sides -- I think I'm
learning that its not as easy to create a Zipfian distribution
as it is made out to be.

An expanded, ongoing  semi-diary/journal,
semi-paper-in-preparation with a dozen graphs
is at  http://linas.org/nlp/word-pairs.pdf

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list