[Corpora-List] Help with Biber's MD analysis
Tony Berber Sardinha
tony at corpuslg.org
Sun Feb 6 12:58:39 UTC 2011
Dear Andrea
In my previous reply, I assumed you were _not_ going to run a factor
analysis on your data, you were going to use Biber's 1988 dimensions
instead, and plot your means on his dimensions. In that case, the rate
of normalization must be the same as his, and you should use his means
instead of the ones for your own corpus.
But if you want to come up with your dimensions, you'll have to run a
factor analysis on your data, and in this case the rate of
normalization does not need to match his, and the means you use must
be the ones coming from your dataset.
I'm sorry if I misunderstood your query.
bye
tony
On Feb 5, 2011, at 8:14 AM, Andrea Nini wrote:
> Dear all,
>
> I am a PhD student conducting research on forensic linguistics but
> using corpus methodologies. I am particulary interested in Biber's
> MD framework and I would like to apply it to the analysis of my
> dataset. Unfortunately I am coming across several problems and I was
> hoping that someone in the Corpora List would be kind enough to help
> me out.
>
> What I would like to do exactly is calculate the dimension scores
> for the texts that I have got in order to see where these texts fall
> in Biber's dimensional space. For example, I want to analyse Text A
> for the 67 variables that Biber used, then calculate the factorial
> score for that text for each dimension and then compare the texts'
> factorial scores with the means that Biber himself provides in his
> book in order to see where that texts falls.
>
> The problems that I am encountering are several. Not being extremely
> competent with statistics, first of all I wonder whether it is even
> possible to do such a thing. Secondly, in Biber's study all the
> texts are normalised to 1000 words, whereas in my case I have texts
> that are just few hundred words long and that would need to be
> normalised to 100 words. Could this affect the results in any way?
>
> The procedure for doing it, according to what I read in the
> literature, is to standardise the frequencies at a mean of 0.0 and
> standard deviation of 1.0 and then sum up the positive features and
> subtract the negative ones. As far as I understand, to standardise
> the means one has to apply the following formula: [((normalised
> frequency of the feature for the text I want to calculate the score
> for) minus (the mean for that feature for the whole genre)) divided
> by (the standard deviation for that feature for the whole genre)].
> A question here, though, is whether I should use as means for the
> whole genre the means from my own datasets or the ones that Biber
> published in the appendix of his "Variation Across Speech and
> Writing".
>
> To sum up, my questions are:
>
> 1) is it possible to calculate the factor scores of a new text using
> the factors that Biber used for his study?
>
> 2) would it affect the results the fact that my texts have to be
> normalised to 100 words whereas Biber's texts were normalised to
> 1000 words?
>
> 3) when calculating the factor scores for my texts, what means
> should I consider? The ones taken from my dataset or the ones taken
> from Biber's study?
>
>
> Thank you very much in advance.
>
>
> Best regards,
>
>
>
> Andrea Nini
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list