Corpora: statistics in CL question
    Alexander S. Yeh 
    asy at mitre.org
       
    Mon Mar 27 23:49:54 UTC 2000
    
    
  
Recently, I saw the following statement (author is unknown):
>In most studies of z-scores and t-scores in computational linguistics,
>you tend to find that scores are too high.  When you compute scores
>for bigrams, for example, you would expect 5% of the scores would be
>greater than 1.65, but you tend to find more than that.
I am trying to find the studies referred to, and what makes some people
believe that the scores are too high. Thank you.
-Alex Yeh (asy at mitre.org)
    
    
More information about the Corpora
mailing list