[Corpora-List] Log-likelihood (was : Re: Questions about t-score)
Stefan Evert
stefan.evert at uos.de
Wed Apr 29 22:21:41 UTC 2009
>
Hi again!
> I read both this documents with the greatest interest, since I've been
> intensively using association measures.
> I have a question regarding log-likelihood computed from contingency
> table. In some case, I obtain nil values for O_12 or O_21 values
> (following your notations). Therefore, the log-likelihood is
> undefined,
> because log(O_12/E_12) (or log(O_21/E_21)) is undefined.
If any of the observed frequencies is zero, you simply drop the
corresponding term from the log-likelihood summation. The
mathematical rationale is that in this case
O_ij * log (O_ij / E_ij) = 0 * log 0 = 0
by continuous extension, because lim[x -> 0] x * log x = 0.
> How to handle such situation to keep a balanced, homogenous score.
> Most
> of the time, nil values are simply ignored (log(O_12/E_12) is simply
> replaced by 0), but I feel the log-likelihood computed that way can
> not
> be correctly interpreted anymore.
No, it's mathematically correct to ignore these terms, and log-
likelihood scores can still be interpreted in the normal way.
BTW, most association measures handle contingency tables with zeroes
(usually O_12 or O_21, but possibly also O_11) if they're properly
implemented (taking care of all special cases); but they will often
break down if the _expected_ frequencies become zero (i.e. for
degenerated contingency tables where an entire row or column is zero).
Hope this helps,
Stefan
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list