[Corpora-List] Distribution of mutual information?

J Washtell lec3jrw at leeds.ac.uk
Fri Mar 13 02:45:01 UTC 2009


Dear Linas,

The weight of your plot is slightly to the right of neutral (MI=0)  
because language exhibits positive associative structure, of which  
apparent negative association can be viewed as a necessary  
side-effect. The first plot you feature on your blog is more lop-sided  
than the second because the latter considers only bigrams and so  
captures less of the associative structure of the language (i.e. it  
does not capture longer-range dependencies). I have not been able to  
reproduce your plot but my best guess is that the log-linear sides are  
an artifact of the Zipfian distribution of word frequencies in the  
language; you will see this distribution very plainly if you plot the  
distribution of just P(x) or of P(y).

The presence of a clearly defined kink is more interesting. I can  
hazard that it might have to do with the necessarily coarse jumps in  
MI that occur in the presence of very low frequency words (of which  
there are many). Or it might indicate a notable difference in MI  
distributions between say pairs comprising content words and those  
wholly or partly comprising function words (the plot seems to be in  
keeping with there being an extra contingent of high frequency,  
lower-than-normal-MI word pairs). These are both very much guesses  
though.

Best regards,

Justin Washtell
University of Leeds

Quoting Linas Vepstas <linasvepstas at gmail.com>:

> I've recently graphed the probability distribution below, its the
> mutual information of word-pairs occurring in the same sentence.
>
> I don't recognize the shape. What is it? It is not any of the standard
> probability distributions, as far as I can tell. But surely, I am not
> the first to observe this. It seems to have a fat tail, but that
> doesn't concern me much; its the oddly mis-shapen nose at the top, and
> the log-linear sides that have my interest.
>
> I explain just a bit more at http://brainwave.opencog.org/ if that helps.
>
> --linas
>



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list