[Corpora-List] variant log likelihood calculations

Don Hardy Don.Hardy at Colostate.edu
Wed Dec 15 01:11:52 UTC 2004


In responses to my question about the exact equivalence of the log
likelihood calculations in Dunning (1993) and Rayson and Garside (2000),
I've been asked what "a" and "b" refer to in the Rayson and Garside
article.

As I understand it, "a" and "b" are the values for the cells in the first
row, "freq of word."  The calculation is LL=2*((a*log(a/E1)) +
(b*log(b/E2))), where E1 =  expected value of "a" and E2 = expected value of
"b". E1 = c*(a+b)/N and E2 = d*(a+b)/N.  "c" is the sum of the cells in
column 1.  "d" is the sum of the cells in column 2. N is the total number of
words.

Apologies to Rayson and Garside and Dunning for any possible inaccuracies in
these summaries.

And, thanks for the responses.

Best,

Don



More information about the Corpora mailing list