[Corpora-List] "normalizing" frequencies for different-sized corpora

Jenny Eagleton jenny at asian-emphasis.com
Mon Sep 12 08:08:35 UTC 2005


Hello Corpora and Statistics Experts,

This is a very simple question for all the
corpora/statistics experts
out there, but this novice is not really
mathematically inclined. I
understand Biber's principle of "normalization,
however I am not sure
about how to calculate it. I want frequency counts
normalized per
1,000 words of text. I can see how to do it if the
figures are even,
i.e. if I have a corpus of 4,000 words and a
frequency of 200, 
I would have a normalized figure of 50.

But for mixed numbers, how would I calculate the
following: For
example if I have 2,646 instances of a certain
kind of noun in a
corpus of 55,166 how would I calculate the
normalized figure per
1,000 words?

Regards,

Jenny
Research Assistant
Dept. of English & Communication
City University of Hong Kong


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20050912/940c171a/attachment.htm>


More information about the Corpora mailing list