[Corpora-List] Help in Applying Appropriate Statistical Test and Its Interpretation
Adam Kilgarriff
adam at lexmasterclass.com
Mon Jun 28 04:54:42 UTC 2010
Muhammad Shakir Aziz,
the null hypothesis-testing you discuss here doesn't work in corpus
linguistics - for the argument see
Language is never ever ever
random.<http://kilgarriff.co.uk/Publications/2005-K-lineer.pdf>
2005 *Corpus Linguistics and Linguistic Theory* 1 (2): 263-276.
My rule of thumb is: it only counts if the ratio (of normalised frequencies)
is greater than/less than a factor of two between two text types
Regards
Adam
On 28 June 2010 05:25, True Friend <true.friend2004 at gmail.com> wrote:
> Good Day to All Copora Members
> I am a masters in applied linguistics student, currently working on my
> thesis. The topic of research is the use of ditransitive constructions. To
> authenticate the results I want to apply statistical techniques on the
> research. For example I am trying to see whether there is a significant
> difference in the usage of two alternative ditransitive patterns in PWE
> (Pakistani Written English, the corpus I am working on for the research).
> The alternative ditransitive patterns here mean Double Object (He gave me a
> pen) and To Dative (He gave a pen to me). I am pasting the table here, which
> contains genre names and frequencies of all verbs (used ditransitively) in
> that genre.
> Genre D. Object To Dative ALT 0 4 ART 210 344 BKS 335 308 BLT 2 7
> BRU 4 2 CLM 108 303 CST 0 7 DIR 1 7 EDT 8 32 FTW 23 14 INT 38 44
> LDS 7 53 LTR 35 92 MGP 2 5 MNF 3 6 MNU 0 1 NLT 7 23 NVL 5 3 NWS 24
> 108 OLT 44 9 PLC 0 1 PRS 11 22 RPR 19 60 RPT 4 17 SRY 0 7 STR 76 36
> THS 20 36 TRN 30 19 WWW 16 30 Some facts about the data are as follows:
> Genre are not of equal in length (number of words) so there may be a genre
> like ALT of a few hundred words, and another like ART of .5 million words.
> Frequencies here are calculated by adding the occurrences of all the verbs
> occurred in the given genre in a given pattern.
> I have applied Chi Square test using R and with this command "cxx =
> chisq.test(x, correct = FALSE)" (while 'x' and 'cxx' are R objects) and the
> result was as follows.
> Pearson's Chi-squared test
>
> data: x
> X-squared = 268.2688, df = 28, p-value < 2.2e-16
>
> Going through the help manuals of R, I came to know that p-value '2.2e-16'
> is a too much small number, so it means that the difference between the two
> variables (Double Object and To Dative) is significant, as p-value for
> social sciences is considered p<0.005. Please correct me if I am
> misunderstanding the test, its results or applying it incorrectly. And if
> this test is not suitable for such kind of analysis, and alternatively which
> kind of test should I apply. And last one last thing, I applied the test on
> normalized frequencies (which were calculated by dividing the frequency of
> each genre with the number of words it has, and the multiplying it with
> 100,000 i.e. .1 million) but the chisquare result was same (same p-value).
> Any help and comments would be highly appreciated.
> Best Regards
>
> --
> Muhammad Shakir Aziz محمد شاکر عزیز
> Masters in Applied Linguistics (last semester student)
> Translator, Course Developer, Linguist for Urdu, Punjabi and English
> Urdu:- http://awaz-e-dost.blogspot.com/
> English:- http://linguisticslearner.blogspot.com/
> Facebook:- http://www.facebook.com/truefriend2004
> Skype:- true_friend2004
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
--
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk
Lexical Computing Ltd http://www.sketchengine.co.uk
Lexicography MasterClass Ltd http://www.lexmasterclass.com
Universities of Leeds and Sussex adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100628/ff3c2cc1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list