Use of Chi-square by Yuri Tambovtsev
Yuri Tambovtsev
yutamb at mail.ru
Mon Jun 28 09:47:02 UTC 2010
Adam Kilgariff wrote that it is not possible to use Chi-square in corpus linguistics. I do not think it is true. One can use Chi-square in linguistics in all cases under the condition that one keeps to the principle of commensurability. That is here, if two samples are equal. I have counted the occurrence of labial consonants in the equal samples of 10000 speech sounds of different Estonian and Russian authors. For instance, in the text of the Estonian writer Aarne Biin «Moetleja» and Enn Vetemaa «Neitsist Suendinud» labial consonants occur 896 and 962 times. Could we say that statistically it is the same? So, we put forward the null hypothesis under the 5% level of significance and one degree of freedom. The theoretical threshold value for Chi-square is 3.841. The actual Chi-square value should be less than 3.841 to state that the occurrence of labial consonant in these two samples is the same. We calculated the Chi-square between 896 and 962. It is 2.344. Thus, it is less than 3.841. So, the two text samples enter the same general sample or in other words it is statistically the same. I wonder if my reasoning is correct
