[Corpora-List] Log-likelihood (was : Re: Questions about t-score)

Emmanuel Prochasson emmanuel.prochasson at univ-nantes.fr
Sat Apr 25 14:08:23 UTC 2009


Stefan Evert a écrit :
> I cannot resist a little bit of self-promotion: you might want to look  
> at my PhD thesis
>
> Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs  
> and Collocations. Dissertation, Institut für maschinelle  
> Sprachverarbeitung, University of Stuttgart. Published in 2005, URN urn:nbn:de:bsz:93-opus-23714 
> .
>
> or this handbook chapter
>
> Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M.  
> Kytö (eds.), Corpus Linguistics. An International Handbook, chapter  
> 58. Mouton de Gruyter, Berlin.
>
> which have extensive discussions of statistical measures of  
> association.  Both can be downloaded from my homepage (see below).
>   
I read both this documents with the greatest interest, since I've been 
intensively using association measures.
I have a question regarding log-likelihood computed from contingency 
table. In some case, I obtain nil values for O_12 or O_21 values 
(following your notations). Therefore, the log-likelihood is undefined, 
because log(O_12/E_12) (or log(O_21/E_21)) is undefined.
However, nil values for O_12 or O_21 is of great interest, it show that 
both token are highly related, since when of them /never appears/ 
without the other.

How to handle such situation to keep a balanced, homogenous score. Most 
of the time, nil values are simply ignored (log(O_12/E_12) is simply 
replaced by 0), but I feel the log-likelihood computed that way can not 
be correctly interpreted anymore. Adding "jitters" to nil value does not 
seem to be clever, since the log function decrease quickly between 1 and 
0 (the jitter choice will have a huge influence).

I'll be interest in any clue to manage those situations.

Regards,

-- 
Emmanuel

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list