[Corpora-List] Re: Chi-Square
FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE
jfidel at siu.buap.mx
Sun Sep 17 13:00:44 UTC 2006
Hi, Crayton,
I'm no expert in collocations, but obviously in testing their significance,
it is on the basis of a fairly large corpus, where many of the cells will
have well over 100 occurrences. This is over the limit where chi squared
gives useful results. This is basically because the formula for chi squared
involves (oversimplifying a bit) a quantity squared (thus the name) divided
by a quantity that tends to increase more nearly linearly. To put it
simply, where the cells contain numbers much over 100, you are virtually
guaranteed that chi squared will produce 'significant' results (usually
defined as p < .05; that is, the probability of the table *not* indicating a
significant result is less than one in twenty). Obviously, this makes this
particular test of very little use, since almost everything you test for
under those circumstances comes out 'significant'. Other, more
sophisticated, statistical tests tend not to be affected by large numbers in
the cells, in the sense of becoming more likely to produce 'significance',
and therefore are more suitable for calculating significance in situations
where the numbers are large.
We hear a lot about chi squared because it is a relatively easy test to
apply, and if the numbers are lowish (under 100) but not too low (over 4 or
5), the test usually gives sensible results.
Jim
Crayton Walker escribió:
> A simple question about statistical measures.
>
> Could someone explain in very simple terms why we don't normally use
> Chi-square as a measure of collocational significance? We tend to use
> t-score and MI and not Chi-square. Why not? I am not a mathematician so
> would appreciate it if you could keep it simple.
>
> Many thanks
>
> Crayton Walker
>
> University of Birmingham
James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla MÉXICO
More information about the Corpora
mailing list