Dear all,<br><br>I am a PhD student conducting research on forensic linguistics but using corpus methodologies. I am particulary interested in Biber's MD framework and I would like to apply it to the analysis of my dataset. Unfortunately I am coming across several problems and I was hoping that someone in the Corpora List would be kind enough to help me out.<br>


<br>What I would like to do exactly is calculate the dimension scores for the texts that I have got in order to see where these texts fall in Biber's dimensional space. For example, I want to analyse Text A for the 67 variables that Biber used, then calculate the factorial score for that text for each dimension and then compare the texts' factorial scores with the means that Biber himself provides in his book in order to see where that texts falls.<br>


<br>The problems that I am encountering are several. Not being extremely competent with statistics, first of all I wonder whether it is even possible to do such a thing. Secondly, in Biber's study all the texts are normalised to 1000 words, whereas in my case I have texts that are just few hundred words long and that would need to be normalised to 100 words. Could this affect the results in any way?<br>


<br>The procedure for doing it, according to what I read in the literature, is to standardise the frequencies at a mean of 0.0 and standard deviation of 1.0 and then sum up the positive features and subtract the negative ones. As far as I understand, to standardise the means one has to apply the following formula: [((normalised frequency of the feature for the text I want to calculate the score for) minus (the mean for that feature for the whole genre)) divided by (the standard deviation for that feature for the whole genre)]. <br>


A question here, though, is whether I should use as means for the whole genre the means from my own datasets or the ones that Biber published in the appendix of his "Variation Across Speech and Writing". <br><br>


To sum up, my questions are:<br><br>1) is it possible to calculate the factor scores of a new text using the factors that Biber used for his study?<br><br>2) would it affect the results the fact that my texts have to be normalised to 100 words whereas Biber's texts were normalised to 1000 words?<br>


<br>3) when calculating the factor scores for my texts, what means should I consider? The ones taken from my dataset or the ones taken from Biber's study?<br><br><br>Thank you very much in advance.<br><br><br>Best regards,<br>


<br>

<br><br>Andrea Nini<br>