[Corpora-List] How to use Chi-square correctly

Rayson, Paul rayson at exchange.lancs.ac.uk
Tue Jul 27 12:22:07 UTC 2010


Hi Yuri,

 

It is possible to use chi-squared when the samples are of different
sizes, but you need to know the reliable limits. Along with two
statisticians at Lancaster, I looked at this and wrote up an experiment
to find the limits:

 

Rayson P., Berridge D. and Francis B. (2004). Extending the Cochran rule
for the comparison of word frequencies between corpora. In Volume II of
Purnelle G., Fairon C., Dister A. (eds.) Le poids des mots: Proceedings
of the 7th International Conference on Statistical analysis of textual
data (JADT 2004), Louvain-la-Neuve, Belgium, March 10-12, 2004, Presses
universitaires de Louvain, pp. 926 - 936.

http://www.comp.lancs.ac.uk/computing/users/paul/publications/rbf04_jadt
.pdf

 

Hope that helps.

 

Paul.

 

Dr. Paul Rayson

Director of UCREL and Lecturer in Computer Science 

Computing Department, Infolab21, Lancaster University, Lancaster, LA1
4WA, UK.

Web: http://www.comp.lancs.ac.uk/~paul/
<http://www.comp.lancs.ac.uk/~paul/> 

Tel: +44 1524 510357 Fax: +44 1524 510492

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf
Of Yuri Tambovtsev
Sent: 27 July 2010 11:08
To: corpora at uib.no
Subject: [Corpora-List] How to use Chi-square correctly

 

Dear Corpora colleagues, some American linguists e.g.

Rob Malouf  and Stefan Th. Gries University of California, Santa Barbara
wrote:

This is especially true when you're comparing really big counts with
really small counts, which is I think what Adam's rule of thumb is meant
to address.  Once you've decided that applying the chi-square test even
makes sense, then questions like significance levels and Bonferroni
corrections come into play. Rob Malouf 
Department of Linguistics and Asian / Middle Eastern Languages San Diego
State University
I wonder if all the linguists on the Corpora list are so advanced in
math. statistics. Being a simple linguist I did not understand anithing.
I mean why it is not possible to use Chi-square criterion when the
samples are different in size. On the contrary, I read in the books on
Chi-square that it is also possible to use it when the samples are not
equal. However, I want to be on the safe side, so I take the equal
samples when comparing two transcribed texts. I usually take a sample of
10000 speech sounds from longer texts. I take the sentences from the
long texts at random. When the sample is 10000 I stop. Is it not
possible to use the Chi-square in this way? I am sure the discussion of
how to use and how not to use the Chi-square criterion and other math.
statistics criteria in linguistics is very important. Looking forward to
hearing for your advice to yutamb at mail.ru  Remain yours sincerely Yuri
Tambovtsev, Novosibirsk, Russia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100727/5c9dd180/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list