<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18904">
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2 face=Arial><A href="mailto:rlc2010@philol.msu.ru"><FONT
color=#000000><SPAN style="TEXT-DECORATION: none">The Use of Chi-square by Yuri
Tambovtsev</SPAN></FONT></A>
<P style="MARGIN-BOTTOM: 0cm; MARGIN-RIGHT: -2.34cm"><A
href="mailto:rlc2010@philol.msu.ru"><FONT color=#000000><FONT
face="arial, sans-serif"><SPAN style="TEXT-DECORATION: none">Adam Kilgariff
wrote that it is not possible to use Chi-square</SPAN></FONT></FONT></A><FONT
color=#000000><FONT face="arial, sans-serif"><SPAN
style="TEXT-DECORATION: none"> in corpus linguistics. I do not think it is
true. One can use Chi-square in linguistics in all cases under the condition
that one keeps to the principle of commensurability. That is here, if two
samples are equal. I have counted the occurrence of labial consonants in the
equal samples of 10000 speech sounds of different Estonian and Russian authors.
For instance, in the text of the Estonian writer Aarne Biin «Moetleja» and Enn
Vetemaa «Neitsist Suendinud» labial consonants occur 896 and 962 times. Could we
say that statistically it is the same? So, we put forward the null hypothesis
under the 5% level of significance and one degree of freedom. The theoretical
threshold value for Chi-square is 3.841. The actual Chi-square value should be
less than 3.841 to state that the occurrence of labial consonant in these two
samples is the same. We calculated the Chi-square between 896 and 962. It is
2.344. Thus, it is less than 3.841. So, the two text samples enter the same
general sample or in other words it is statistically the same.
</SPAN></FONT></FONT><FONT color=#000000><FONT face="arial, sans-serif"><SPAN
style="TEXT-DECORATION: none">I wonder if my reasoning is correct?
</SPAN></FONT></FONT></P></FONT></DIV></BODY></HTML>