[Corpora-List] Statistical tests for corpus studies

Adam Kilgarriff adam.kilgarriff at itri.brighton.ac.uk
Wed May 7 09:45:19 UTC 2003


Josephine,

chi-square will probably not give you what you want, nor will
log-likelihood - my paper on "Comparing Corpora" (Int Jnl Corpus
Linguistics 2001) explains why.  Non-parametric tests are more suitable,
I found the Mann-Whitney test did the job well.  It involves chopping
each corpus up into same-size slices.

Regards,

    Adam

Josephine Lo wrote:

> Dear all,
>
> As a lay-man to statistics, I wish to get some advice on the tests
> suitable for comparing the frequency of a specific type of word in
> corpora of different genre. Having in mind are Chi-square and ANOVA
> but I'm not sure they are the appropriate ones.
>
> Thanks in advance
>
>
> Josephine Lo
> Research Assistant
> Dept. of English and Communication
> City University of Hong Kong
>

--

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff
ITRI, University of Brighton                   tel: (44) 1273 642919
Lewes Road, Brighton BN2 4GJ, UK               fax: (44) 1273 642908
adam at itri.bton.ac.uk     http://www.itri.bton.ac.uk/~Adam.Kilgarriff
  and
Lexicography MasterClass Ltd.
71 Freshfield Road, Brighton BN2 0BL, UK       tel: (44) 1273 705773
adam at lexmasterclass.com                http://www.lexmasterclass.com
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



More information about the Corpora mailing list