[Corpora-List] Help with Biber's MD analysis

Tony Berber Sardinha tony at corpuslg.org
Sat Feb 5 18:13:23 UTC 2011


Dear Andrea

1) yes
2) yes
3) his means

bye

tony



On Feb 5, 2011, at 8:14 AM, Andrea Nini wrote:

> Dear all,
>
> I am a PhD student conducting research on forensic linguistics but  
> using corpus methodologies. I am particulary interested in Biber's  
> MD framework and I would like to apply it to the analysis of my  
> dataset. Unfortunately I am coming across several problems and I was  
> hoping that someone in the Corpora List would be kind enough to help  
> me out.
>
> What I would like to do exactly is calculate the dimension scores  
> for the texts that I have got in order to see where these texts fall  
> in Biber's dimensional space. For example, I want to analyse Text A  
> for the 67 variables that Biber used, then calculate the factorial  
> score for that text for each dimension and then compare the texts'  
> factorial scores with the means that Biber himself provides in his  
> book in order to see where that texts falls.
>
> The problems that I am encountering are several. Not being extremely  
> competent with statistics, first of all I wonder whether it is even  
> possible to do such a thing. Secondly, in Biber's study all the  
> texts are normalised to 1000 words, whereas in my case I have texts  
> that are just few hundred words long and that would need to be  
> normalised to 100 words. Could this affect the results in any way?
>
> The procedure for doing it, according to what I read in the  
> literature, is to standardise the frequencies at a mean of 0.0 and  
> standard deviation of 1.0 and then sum up the positive features and  
> subtract the negative ones. As far as I understand, to standardise  
> the means one has to apply the following formula: [((normalised  
> frequency of the feature for the text I want to calculate the score  
> for) minus (the mean for that feature for the whole genre)) divided  
> by (the standard deviation for that feature for the whole genre)].
> A question here, though, is whether I should use as means for the  
> whole genre the means from my own datasets or the ones that Biber  
> published in the appendix of his "Variation Across Speech and  
> Writing".
>
> To sum up, my questions are:
>
> 1) is it possible to calculate the factor scores of a new text using  
> the factors that Biber used for his study?
>
> 2) would it affect the results the fact that my texts have to be  
> normalised to 100 words whereas Biber's texts were normalised to  
> 1000 words?
>
> 3) when calculating the factor scores for my texts, what means  
> should I consider? The ones taken from my dataset or the ones taken  
> from Biber's study?
>
>
> Thank you very much in advance.
>
>
> Best regards,
>
>
>
> Andrea Nini
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list