<html><head><meta content="Group-Office 2.04" name="GENERATOR"></head><body>


Thanks for the quick response from everybody, I have got the idea now.<br /><br />Jenny<br /><blockquote style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color rgb(34, 67, 127); border-width: 0pt 0pt 0pt 2px; margin: 0px 0px 0px 5px; padding: 0px 0px 0px 5px;"><font size="2" face="verdana">----- Original Message -----<br /><b>Subject: </b>Re: [Corpora-List] "normalizing" frequencies for different-sized corpora<br /><b>From: </b>eric@comp.leeds.ac.uk<br /><b>To: </b>jenny@asian-emphasis.com<br /><b>Date: </b>12-09-2005 16:59<br /></font><br /><br />Jenny,<br /><br />I may be missing something, but I think the way to find a per-thousand<br />figure is simply:<br /><br /><br />( (freq of word) / (no of words in text) ) * 1000<br /><br />eg (200/4000) * 1000   = 50<br /><br />or (2646/55166) * 1000  = 48  (to nearest whole number)<br /><br />  - of course it's up to you whether to round to

nearest whole n7umber,<br />    or give the answer to 2 decimal palces (47.96)  or some other level<br />of accuracy; but since generally a text is only a sample or<br />approximation of the language you are studying, it is sensible not to<br />claim too much accuracy/significance.<br /><br />eric atwell<br /><br /><br />On Mon, 12 Sep 2005, Jenny Eagleton wrote:<br /><br />> Hello Corpora and Statistics Experts,<br />><br />> This is a very simple question for all the<br />> corpora/statistics experts<br />> out there, but this novice is not really<br />> mathematically inclined. I<br />> understand Biber's principle of "normalization,<br />> however I am not sure<br />> about how to calculate it. I want frequency counts<br />> normalized per<br />> 1,000 words of text. I can see how to do it if the<br />> figures are even,<br />> i.e. if I have a corpus of 4,000 words and a<br />> frequency of 200,&#160;<br />> I would have a

normalized figure of 50.<br />><br />> But for mixed numbers, how would I calculate the<br />> following: For<br />> example if I have 2,646 instances of a certain<br />> kind of noun in a<br />> corpus of 55,166 how would I calculate the<br />> normalized figure per<br />> 1,000 words?<br />><br />> Regards,<br />><br />> Jenny<br />> Research Assistant<br />> Dept. of English & Communication<br />> City University of Hong Kong<br />><br />><br />><br /><br />-- <br />Eric Atwell, Senior Lecturer, Language research group, School of Computing, <br />Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England<br />TEL: +44-113-2335430  FAX: +44-113-2335468  <a class="blue" target="_blank" href="http://www.comp.leeds.ac.uk/eric">http://www.comp.leeds.ac.uk/eric</a><br /></blockquote></body></html>