[Corpora-List] Help in Applying Appropriate Statistical Test and Its Interpretation

"Thomas François" Thomas.Francois at uclouvain.be
Mon Jun 28 08:36:14 UTC 2010


Dear Muhammad Shakir Aziz,

working with a large SMS corpus, I met the same situation you are
describing. The article of Kilgarriff(2005) is very instructive and helped
me to better understand the problem.

I also recommand you to read :

Grissom, R. J. and Kim, J. J. (2005). Effects sizes for Research : A Broad
Practical Approach. Mahwah (N.J.) : Lawrence Erlbaum Associates.

They propose to use association measures (correlation) rather than a
significance test such as the chi-squared, when working with a lot of data
(as it often happens now in corpus linguistics). Indeed, association
measures inform you better on the size of the effect between your
variables (with a lot of data, you can have a very significant X² that
will correspond to a poor correlation rate).

Regards

Thomas François


> Muhammad Shakir Aziz,
>
> the null hypothesis-testing you discuss here doesn't work in corpus
> linguistics - for the argument see
> Language is never ever ever
> random.<http://kilgarriff.co.uk/Publications/2005-K-lineer.pdf>
>  2005 *Corpus Linguistics and Linguistic Theory* 1 (2): 263-276.
>
> My rule of thumb is: it only counts if the ratio (of normalised
> frequencies)
> is greater than/less than a factor of two between two text types
>
> Regards
>
> Adam
>
> On 28 June 2010 05:25, True Friend <true.friend2004 at gmail.com> wrote:
>
>> Good Day to All Copora Members
>> I am a masters in applied linguistics student, currently working on my
>> thesis. The topic of research is the use of ditransitive constructions.
>> To
>> authenticate the results I want to apply statistical techniques on the
>> research. For example I am trying to see whether there is a significant
>> difference in the usage of two alternative ditransitive patterns in PWE
>> (Pakistani Written English, the corpus I am working on for the
>> research).
>> The alternative ditransitive patterns here mean Double Object (He gave
>> me a
>> pen) and To Dative (He gave a pen to me). I am pasting the table here,
>> which
>> contains genre names and frequencies of all verbs (used ditransitively)
>> in
>> that genre.
>>  Genre D. Object To Dative  ALT 0 4  ART 210 344  BKS 335 308  BLT 2 7
>> BRU 4 2  CLM 108 303  CST 0 7  DIR 1 7  EDT 8 32  FTW 23 14  INT 38 44
>> LDS 7 53  LTR 35 92  MGP 2 5  MNF 3 6  MNU 0 1  NLT 7 23  NVL 5 3  NWS
>> 24
>> 108  OLT 44 9  PLC 0 1  PRS 11 22  RPR 19 60  RPT 4 17  SRY 0 7  STR 76
>> 36
>> THS 20 36  TRN 30 19  WWW 16 30 Some facts about the data are as
>> follows:
>> Genre are not of equal in length (number of words) so there may be a
>> genre
>> like ALT of a few hundred words, and another like ART of .5 million
>> words.
>> Frequencies here are calculated by adding the occurrences of all the
>> verbs
>> occurred in the given genre in a given pattern.
>> I have applied Chi Square test using R and with this command "cxx =
>> chisq.test(x, correct = FALSE)" (while 'x' and 'cxx' are R objects) and
>> the
>> result was as follows.
>> Pearson's Chi-squared test
>>
>> data:  x
>> X-squared = 268.2688, df = 28, p-value < 2.2e-16
>>
>> Going through the help manuals of R, I came to know that p-value
>> '2.2e-16'
>> is a too much small number, so it means that the difference between the
>> two
>> variables (Double Object and To Dative) is significant, as p-value for
>> social sciences is considered p<0.005. Please correct me if I am
>> misunderstanding the test, its results or applying it incorrectly. And
>> if
>> this test is not suitable for such kind of analysis, and alternatively
>> which
>> kind of test should I apply. And last one last thing, I applied the test
>> on
>> normalized frequencies (which were calculated by dividing the frequency
>> of
>> each genre with the number of words it has, and the multiplying it with
>> 100,000 i.e. .1 million) but the chisquare result was same (same
>> p-value).
>> Any help and comments would be highly appreciated.
>> Best Regards
>>
>> --
>> Muhammad Shakir Aziz محمد شاکر عزیز
>> Masters in Applied Linguistics (last semester student)
>> Translator, Course Developer, Linguist for Urdu, Punjabi and English
>> Urdu:- http://awaz-e-dost.blogspot.com/
>> English:- http://linguisticslearner.blogspot.com/
>> Facebook:- http://www.facebook.com/truefriend2004
>> Skype:- true_friend2004
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> ================================================
> Adam Kilgarriff
> http://www.kilgarriff.co.uk
> Lexical Computing Ltd                   http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd      http://www.lexmasterclass.com
> Universities of Leeds and Sussex       adam at lexmasterclass.com
> ================================================
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list