<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    There are also English texts without THE (lists of products,
    election results etc.) so the computation either way would need to
    avoid dividing by zero...<br>
    <br>
    What a useful discussion. Clarified a particularly cluttered and
    dusty corner of my own thinking.<br>
     <br>
    Cheers -- Mike<br>
    <br>
    On 13/09/2011 19:19, Rich Cooper wrote:
    <blockquote cite="mid:8F5B2CF749FB48498957BEBC9BE963FF@Gateway"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <meta name="Generator" content="Microsoft Word 11 (filtered
        medium)">
      <!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
      <style>
<!--a:link
        {mso-style-priority:99;}
span.MSOHYPERLINK
        {mso-style-priority:99;}
a:visited
        {mso-style-priority:99;}
span.MSOHYPERLINKFOLLOWED
        {mso-style-priority:99;}

 /* Font Definitions */
 @font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal;
        font-family:Calibri;
        color:#1F497D;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:Arial;
        color:blue;
        font-weight:normal;
        font-style:normal;
        text-decoration:none none;}
@page Section1
        {size:8.5in 11.0in;
        margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.Section1
        {page:Section1;}
-->
</style><!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1" />
 </o:shapelayout></xml><![endif]-->
      <div class="Section1">
        <p class="MsoNormal"><font color="blue" face="Arial" size="2"><span
              style="font-size:
              10.0pt;font-family:Arial;color:blue">Using “the/I” can
              lead to
              infinite values in corpora (scientific lit, patents) that
              never use the pronoun
              “I”.  It might be better practice to use the inverse, i.e.
              the
              “I/the” ration, which would be 0.0 for such corpora.  </span></font></p>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Mike Scott

***
If you publish research which uses WordSmith, do let me know so I can include it at
<a class="moz-txt-link-freetext" href="http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm">http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm</a>
***
University of Aston and Lexical Analysis Software Ltd.
<a class="moz-txt-link-abbreviated" href="mailto:mike.scott@aston.ac.uk">mike.scott@aston.ac.uk</a>
<a class="moz-txt-link-abbreviated" href="http://www.lexically.net">www.lexically.net</a>
</pre>
  </body>
</html>