[Corpora-List] Help in Applying Appropriate Statistical Test and Its Interpretation

Adam Kilgarriff adam at lexmasterclass.com
Mon Jun 28 04:54:42 UTC 2010


Muhammad Shakir Aziz,

the null hypothesis-testing you discuss here doesn't work in corpus
linguistics - for the argument see
Language is never ever ever
random.<http://kilgarriff.co.uk/Publications/2005-K-lineer.pdf>
 2005 *Corpus Linguistics and Linguistic Theory* 1 (2): 263-276.

My rule of thumb is: it only counts if the ratio (of normalised frequencies)
is greater than/less than a factor of two between two text types

Regards

Adam

On 28 June 2010 05:25, True Friend <true.friend2004 at gmail.com> wrote:

> Good Day to All Copora Members
> I am a masters in applied linguistics student, currently working on my
> thesis. The topic of research is the use of ditransitive constructions. To
> authenticate the results I want to apply statistical techniques on the
> research. For example I am trying to see whether there is a significant
> difference in the usage of two alternative ditransitive patterns in PWE
> (Pakistani Written English, the corpus I am working on for the research).
> The alternative ditransitive patterns here mean Double Object (He gave me a
> pen) and To Dative (He gave a pen to me). I am pasting the table here, which
> contains genre names and frequencies of all verbs (used ditransitively) in
> that genre.
>  Genre D. Object To Dative  ALT 0 4  ART 210 344  BKS 335 308  BLT 2 7
> BRU 4 2  CLM 108 303  CST 0 7  DIR 1 7  EDT 8 32  FTW 23 14  INT 38 44
> LDS 7 53  LTR 35 92  MGP 2 5  MNF 3 6  MNU 0 1  NLT 7 23  NVL 5 3  NWS 24
> 108  OLT 44 9  PLC 0 1  PRS 11 22  RPR 19 60  RPT 4 17  SRY 0 7  STR 76 36
> THS 20 36  TRN 30 19  WWW 16 30 Some facts about the data are as follows:
> Genre are not of equal in length (number of words) so there may be a genre
> like ALT of a few hundred words, and another like ART of .5 million words.
> Frequencies here are calculated by adding the occurrences of all the verbs
> occurred in the given genre in a given pattern.
> I have applied Chi Square test using R and with this command "cxx =
> chisq.test(x, correct = FALSE)" (while 'x' and 'cxx' are R objects) and the
> result was as follows.
> Pearson's Chi-squared test
>
> data:  x
> X-squared = 268.2688, df = 28, p-value < 2.2e-16
>
> Going through the help manuals of R, I came to know that p-value  '2.2e-16'
> is a too much small number, so it means that the difference between the two
> variables (Double Object and To Dative) is significant, as p-value for
> social sciences is considered p<0.005. Please correct me if I am
> misunderstanding the test, its results or applying it incorrectly. And if
> this test is not suitable for such kind of analysis, and alternatively which
> kind of test should I apply. And last one last thing, I applied the test on
> normalized frequencies (which were calculated by dividing the frequency of
> each genre with the number of words it has, and the multiplying it with
> 100,000 i.e. .1 million) but the chisquare result was same (same p-value).
> Any help and comments would be highly appreciated.
> Best Regards
>
> --
> Muhammad Shakir Aziz محمد شاکر عزیز
> Masters in Applied Linguistics (last semester student)
> Translator, Course Developer, Linguist for Urdu, Punjabi and English
> Urdu:- http://awaz-e-dost.blogspot.com/
> English:- http://linguisticslearner.blogspot.com/
> Facebook:- http://www.facebook.com/truefriend2004
> Skype:- true_friend2004
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100628/ff3c2cc1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list