[Corpora-List] Log-likelihood (was : Re: Questions about t-score)
Emmanuel Prochasson
emmanuel.prochasson at univ-nantes.fr
Sat Apr 25 14:08:23 UTC 2009
Stefan Evert a écrit :
> I cannot resist a little bit of self-promotion: you might want to look
> at my PhD thesis
>
> Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs
> and Collocations. Dissertation, Institut für maschinelle
> Sprachverarbeitung, University of Stuttgart. Published in 2005, URN urn:nbn:de:bsz:93-opus-23714
> .
>
> or this handbook chapter
>
> Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M.
> Kytö (eds.), Corpus Linguistics. An International Handbook, chapter
> 58. Mouton de Gruyter, Berlin.
>
> which have extensive discussions of statistical measures of
> association. Both can be downloaded from my homepage (see below).
>
I read both this documents with the greatest interest, since I've been
intensively using association measures.
I have a question regarding log-likelihood computed from contingency
table. In some case, I obtain nil values for O_12 or O_21 values
(following your notations). Therefore, the log-likelihood is undefined,
because log(O_12/E_12) (or log(O_21/E_21)) is undefined.
However, nil values for O_12 or O_21 is of great interest, it show that
both token are highly related, since when of them /never appears/
without the other.
How to handle such situation to keep a balanced, homogenous score. Most
of the time, nil values are simply ignored (log(O_12/E_12) is simply
replaced by 0), but I feel the log-likelihood computed that way can not
be correctly interpreted anymore. Adding "jitters" to nil value does not
seem to be clever, since the log function decrease quickly between 1 and
0 (the jitter choice will have a huge influence).
I'll be interest in any clue to manage those situations.
Regards,
--
Emmanuel
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list