[Corpora-List] Help in Applying Appropriate Statistical Test and Its Interpretation

True Friend true.friend2004 at gmail.com
Mon Jun 28 04:25:41 UTC 2010


Good Day to All Copora Members
I am a masters in applied linguistics student, currently working on my
thesis. The topic of research is the use of ditransitive constructions. To
authenticate the results I want to apply statistical techniques on the
research. For example I am trying to see whether there is a significant
difference in the usage of two alternative ditransitive patterns in PWE
(Pakistani Written English, the corpus I am working on for the research).
The alternative ditransitive patterns here mean Double Object (He gave me a
pen) and To Dative (He gave a pen to me). I am pasting the table here, which
contains genre names and frequencies of all verbs (used ditransitively) in
that genre.
 Genre D. Object To Dative  ALT 0 4  ART 210 344  BKS 335 308  BLT 2 7  BRU
4 2  CLM 108 303  CST 0 7  DIR 1 7  EDT 8 32  FTW 23 14  INT 38 44  LDS 7 53
LTR 35 92  MGP 2 5  MNF 3 6  MNU 0 1  NLT 7 23  NVL 5 3  NWS 24 108  OLT 44
9  PLC 0 1  PRS 11 22  RPR 19 60  RPT 4 17  SRY 0 7  STR 76 36  THS 20 36
TRN 30 19  WWW 16 30 Some facts about the data are as follows:
Genre are not of equal in length (number of words) so there may be a genre
like ALT of a few hundred words, and another like ART of .5 million words.
Frequencies here are calculated by adding the occurrences of all the verbs
occurred in the given genre in a given pattern.
I have applied Chi Square test using R and with this command "cxx =
chisq.test(x, correct = FALSE)" (while 'x' and 'cxx' are R objects) and the
result was as follows.
Pearson's Chi-squared test

data:  x
X-squared = 268.2688, df = 28, p-value < 2.2e-16

Going through the help manuals of R, I came to know that p-value  '2.2e-16'
is a too much small number, so it means that the difference between the two
variables (Double Object and To Dative) is significant, as p-value for
social sciences is considered p<0.005. Please correct me if I am
misunderstanding the test, its results or applying it incorrectly. And if
this test is not suitable for such kind of analysis, and alternatively which
kind of test should I apply. And last one last thing, I applied the test on
normalized frequencies (which were calculated by dividing the frequency of
each genre with the number of words it has, and the multiplying it with
100,000 i.e. .1 million) but the chisquare result was same (same p-value).
Any help and comments would be highly appreciated.
Best Regards

-- 
Muhammad Shakir Aziz محمد شاکر عزیز
Masters in Applied Linguistics (last semester student)
Translator, Course Developer, Linguist for Urdu, Punjabi and English
Urdu:- http://awaz-e-dost.blogspot.com/
English:- http://linguisticslearner.blogspot.com/
Facebook:- http://www.facebook.com/truefriend2004
Skype:- true_friend2004
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100628/63639d8b/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list