[Corpora-List] Brown Corpus
Jean Veronis
Jean.Veronis at up.univ-mrs.fr
Fri Jun 17 12:28:24 UTC 2005
Hi Adam,
Although I agree on the same-size sample design, I am less convinced by
the use of the mean and standard deviation on corpora (as well as
t-score and a few others). The distributions are so strongly skewed that
these measures are probably not advisable. Without getting into anything
too complicated, the median and measures based on it, like the MAD (mean
absolute deviation), and in general what's called "robust statistics",
seem preferable to me.
--j
http://aixtal.blogspot.com
More information about the Corpora
mailing list