[Corpora-List] Distribution of mutual information?
Linas Vepstas
linasvepstas at gmail.com
Mon Mar 16 03:28:27 UTC 2009
2009/3/15 J Washtell <lec3jrw at leeds.ac.uk>:
> Dear Linas,
>
>> distribution of MI(x,y) -- i.e. a graph of the likelihood of observing a
>> particular value of mutual information.
>
> The zipf distribution has a single log-linear slope. You are combining more
> than one such distribution here via MI, so it is to be expected that the
> result is a quasi-symmetrical distribution with log-linear slopes.
>
> Please compare the attached plot to yours. It is a probability distribution
> over log(p(x1)p(y1)/p(x2)p(y2)), where x and y exhibit approximate zipf
> distributions. In other words, it is comparable to calculating MI upon a
> random corpus which has no associative structure.
>
> You will notice that it is strikingly similar to yours, although of course
> lacking the rightward skew due to association and also the interesting kinks
> you obsevere
Will, I've taken a more circuitous route, but not yet arrived
at your results. At the moment, I've been trying to generate
random texts. It appears that my random texts are not
Zipfian enough, they're rather stair-steppy; and this produces
MI graphs with Gaussian fall-offs to the sides -- I think I'm
learning that its not as easy to create a Zipfian distribution
as it is made out to be.
An expanded, ongoing semi-diary/journal,
semi-paper-in-preparation with a dozen graphs
is at http://linas.org/nlp/word-pairs.pdf
--linas
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list