[Corpora-List] Re: Chi-Square

FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE jfidel at siu.buap.mx
Sun Sep 17 13:00:44 UTC 2006


Hi, Crayton, 

I'm no expert in collocations, but obviously in testing their significance, 
it is on the basis of a fairly large corpus, where many of the cells will 
have well over 100 occurrences. This is over the limit where chi squared 
gives useful results. This is basically because the formula for chi squared 
involves (oversimplifying a bit) a quantity squared (thus the name) divided 
by a quantity that tends to increase more nearly linearly.  To put it 
simply, where the cells contain numbers much over 100, you are virtually 
guaranteed that chi squared will produce 'significant' results (usually 
defined as p < .05; that is, the probability of the table *not* indicating a 
significant result is less than one in twenty). Obviously, this makes this 
particular test of very little use, since almost everything you test for 
under those circumstances comes out 'significant'. Other, more 
sophisticated, statistical tests tend not to be affected by large numbers in 
the cells, in the sense of becoming more likely to produce 'significance', 
and therefore are more suitable for calculating significance in situations 
where the numbers are large. 

We hear a lot about chi squared because it is a relatively easy test to 
apply, and if the numbers are lowish (under 100) but not too low (over 4 or 
5), the test usually gives sensible results. 

Jim 


Crayton Walker escribió: 

> A simple question about statistical measures. 
> 
> Could someone explain in very simple terms why we don't normally use
> Chi-square as a measure of collocational significance? We tend to use
> t-score and MI and not Chi-square. Why not? I am not a mathematician so
> would appreciate it if you could keep it simple. 
> 
> Many thanks 
> 
> Crayton Walker 
> 
> University of Birmingham
 


James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla     MÉXICO 



More information about the Corpora mailing list