[Corpora-List] Brown Corpus

Jean Veronis Jean.Veronis at up.univ-mrs.fr
Fri Jun 17 12:28:24 UTC 2005


Hi Adam,

Although I agree on the same-size sample design, I am less convinced by
the use of the mean and standard deviation on corpora (as well as
t-score and a few others). The distributions are so strongly skewed that
these measures are probably not advisable. Without getting into anything
too complicated, the median and measures based on it, like the MAD (mean
absolute deviation), and in general what's called "robust statistics",
seem preferable to me.

--j
  http://aixtal.blogspot.com



More information about the Corpora mailing list