[Corpora-List] Log-likelihood (was : Re: Questions about t-score)

Stefan Evert stefan.evert at uos.de
Wed Apr 29 22:21:41 UTC 2009


>

Hi again!

> I read both this documents with the greatest interest, since I've been
> intensively using association measures.
> I have a question regarding log-likelihood computed from contingency
> table. In some case, I obtain nil values for O_12 or O_21 values
> (following your notations). Therefore, the log-likelihood is  
> undefined,
> because log(O_12/E_12) (or log(O_21/E_21)) is undefined.

If any of the observed frequencies is zero, you simply drop the  
corresponding term from the log-likelihood summation.  The  
mathematical rationale is that in this case

	O_ij * log (O_ij / E_ij) = 0 * log 0 = 0

by continuous extension, because lim[x -> 0] x * log x = 0.

> How to handle such situation to keep a balanced, homogenous score.  
> Most
> of the time, nil values are simply ignored (log(O_12/E_12) is simply
> replaced by 0), but I feel the log-likelihood computed that way can  
> not
> be correctly interpreted anymore.

No, it's mathematically correct to ignore these terms, and log- 
likelihood scores can still be interpreted in the normal way.

BTW, most association measures handle contingency tables with zeroes  
(usually O_12 or O_21, but possibly also O_11) if they're properly  
implemented (taking care of all special cases); but they will often  
break down if the _expected_ frequencies become zero (i.e. for  
degenerated contingency tables where an entire row or column is zero).

Hope this helps,
Stefan

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list