Chi-square

Joseph Davis jdavis at ccny.cuny.edu
Fri Sep 3 21:30:16 UTC 2010


A colleague sent to me the June 28 posting by 
Yuri Tambovtsev below, to which I offer a belated reply that may be useful.

The Use of Chi-square by Yuri Tambovtsev
Adam Kilgariff wrote that it is not possible to 
use Chi-square in corpus linguistics. I do not 
think it is true. One can use Chi-square in 
linguistics in all cases under the condition that 
one keeps to the principle of commensurability. 
That is here, if two samples are equal. I have 
counted the occurrence of labial consonants in 
the equal samples of 10000 speech sounds of 
different Estonian and Russian authors. For 
instance, in the text of the Estonian writer 
Aarne Biin «Moetleja» and Enn Vetemaa «Neitsist 
Suendinud» labial consonants occur 896 and 962 
times. Could we say that statistically it is the 
same? So, we put forward the null hypothesis 
under the 5% level of significance and one degree 
of freedom. The theoretical threshold value for 
Chi-square is 3.841. The actual Chi-square value 
should be less than 3.841 to state that the 
occurrence of labial consonant in these two 
samples is the same. We calculated the Chi-square 
between 896 and 962. It is 2.344. Thus, it is 
less than 3.841. So, the two text samples enter 
the same general sample or in other words it is 
statistically the same. I wonder if my reasoning is correct.
[End of quotation from June 28 posting by Yuri Tambovtsev]

The main requirement for the use of the 
chi-square test of significance is that the 
observations (data points, tokens) in the sample 
of some population be statistically 
independent.  That is, there should be no 
statistical relation between one observation and 
another in the data set.  It should not be 
possible, given the occurrence of one 
observation, to predict the next observation or 
any other observation.  In my experience, such 
independence among observations typically is not 
a property of connected discourse.  Rather, the 
occurrence of one observation typically raises 
the probability of the same type of observation 
occurring next or later in the discourse, no 
doubt because connected discourse is typically coherent, not random.
For instance, if a text in English concerns 
largely the topic of ‘peace,’ then there will 
likely be many instances in the text of the 
labial [p], due to the frequency of the word 
‘peace’ and related words (‘peaceful,’ ‘pacify,’ 
‘peacenik,’ etc.).  By contrast, if another text 
is about ‘health,’ then it will have a 
disproportionately high frequency of [h], 
relative to [p].  Consequently, given any 
occurrence of a labial in the first text, there 
will be a somewhat elevated probability for 
occurrence of a labial next or soon; versus the 
possibility of predicting another [h] in the 
second text.  This is statistical dependence, not 
independence.  As a result, chi-square is not 
appropriate as a test of significance; it will 
likely give an inaccurate measure of the degree 
to which the sample of labials is representative 
of the larger population of discourse from which 
the sample was drawn.  (In this case, I suppose 
we can only imagine a hypothetical population of 
“English” discourse from which our text was in 
some idealistic sense “drawn”­-another reason the 
use of a statistical test of significance may be 
inappropriate:  a text is not in any real sense a sample from a population.)
I have a chapter from several years ago that 
addresses this problem in relation to somewhat 
different analytical concerns.  The reference 
is:  Joseph Davis, 2002, “Rethinking the place of 
statistics in Columbia School analysis,” in 
Wallis Reid, Ricardo Otheguy, and Nancy Stern 
(eds.), Signal, meaning, and 
message:  Perspectives on sign-based linguistics 
(pp. 65-90).  Amsterdam/Philadelphia:  John Benjamins.


Joseph Davis, Ph.D.
Associate Professor
School of Education, NAC 6207
The City College
New York, NY  10031 



More information about the Funknet mailing list