[Corpora-List] Help with Biber's MD analysis

Sat Feb 5 10:14:33 UTC 2011

Dear all,

I am a PhD student conducting research on forensic linguistics but using
corpus methodologies. I am particulary interested in Biber's MD framework
and I would like to apply it to the analysis of my dataset. Unfortunately I
am coming across several problems and I was hoping that someone in the
Corpora List would be kind enough to help me out.

What I would like to do exactly is calculate the dimension scores for the
texts that I have got in order to see where these texts fall in Biber's
dimensional space. For example, I want to analyse Text A for the 67
variables that Biber used, then calculate the factorial score for that text
for each dimension and then compare the texts' factorial scores with the
means that Biber himself provides in his book in order to see where that
texts falls.

The problems that I am encountering are several. Not being extremely
competent with statistics, first of all I wonder whether it is even possible
to do such a thing. Secondly, in Biber's study all the texts are normalised
to 1000 words, whereas in my case I have texts that are just few hundred
words long and that would need to be normalised to 100 words. Could this
affect the results in any way?

The procedure for doing it, according to what I read in the literature, is
to standardise the frequencies at a mean of 0.0 and standard deviation of
1.0 and then sum up the positive features and subtract the negative ones. As
far as I understand, to standardise the means one has to apply the following
formula: [((normalised frequency of the feature for the text I want to
calculate the score for) minus (the mean for that feature for the whole
genre)) divided by (the standard deviation for that feature for the whole
genre)].
A question here, though, is whether I should use as means for the whole
genre the means from my own datasets or the ones that Biber published in the
appendix of his "Variation Across Speech and Writing".

To sum up, my questions are:

1) is it possible to calculate the factor scores of a new text using the
factors that Biber used for his study?

2) would it affect the results the fact that my texts have to be normalised
to 100 words whereas Biber's texts were normalised to 1000 words?

3) when calculating the factor scores for my texts, what means should I
consider? The ones taken from my dataset or the ones taken from Biber's
study?

Thank you very much in advance.

Best regards,

Andrea Nini
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110205/99a40acc/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora