[Corpora-List] calculation problem

Marco Baroni baroni at sslmit.unibo.it
Thu Oct 20 17:20:41 UTC 2005


Dear Alexander,

I'm a bit confused...

> if you assume that occurences in your corpus are distributed uniformly
> (actually the simplest probability distribution ever), you can take this 100
> number
>
> Otherwise, if you use another distribution that better describes behaviour
> of the occurences it will influence the number of occurences in the 1
> million corpus and will be probably not 100.
>

Isn't the problem rather one  of (non-random) sampling, and not a matter of
the assumed distribution (which, as far as I can tell, is not assumed to be
uniform)?

Regards,

Marco



More information about the Corpora mailing list