[Corpora-List] calculation problem
Marco Baroni
baroni at sslmit.unibo.it
Thu Oct 20 17:20:41 UTC 2005
Dear Alexander,
I'm a bit confused...
> if you assume that occurences in your corpus are distributed uniformly
> (actually the simplest probability distribution ever), you can take this 100
> number
>
> Otherwise, if you use another distribution that better describes behaviour
> of the occurences it will influence the number of occurences in the 1
> million corpus and will be probably not 100.
>
Isn't the problem rather one of (non-random) sampling, and not a matter of
the assumed distribution (which, as far as I can tell, is not assumed to be
uniform)?
Regards,
Marco
More information about the Corpora
mailing list