[Corpora-List] calculation problem
Alexander Osherenko
osherenko at gmx.de
Fri Oct 21 09:20:28 UTC 2005
Dear Marco,
I tried to give the simplest explanation.
You say "Bad sampling" is a problem. I don't argue, but in bootstrapping you
must make some considerations if you want to get further. Such
considerations are - Sampling is good, I take the simplest distribution and
calculate the results.
If you are not satisfied with system results (actually also a problem - what
can be considered to be a good measure of system quality?) you can always
choose another distribution and increase amount of samples.
Cheers,
Alexander
P.S. BTW, I don't think that Helene wanted a thorough mathematical
explanation of her case.
> --- Ursprüngliche Nachricht ---
> Von: Marco Baroni <baroni at sslmit.unibo.it>
> An: Alexander Osherenko <osherenko at gmx.de>
> Kopie: CORPORA at UIB.NO
> Betreff: Re: [Corpora-List] calculation problem
> Datum: Thu, 20 Oct 2005 19:20:41 +0200
>
> Dear Alexander,
>
> I'm a bit confused...
>
> > if you assume that occurences in your corpus are distributed uniformly
> > (actually the simplest probability distribution ever), you can take this
> 100
> > number
> >
> > Otherwise, if you use another distribution that better describes
> behaviour
> > of the occurences it will influence the number of occurences in the 1
> > million corpus and will be probably not 100.
> >
>
> Isn't the problem rather one of (non-random) sampling, and not a matter
> of
> the assumed distribution (which, as far as I can tell, is not assumed to
> be
> uniform)?
>
> Regards,
>
> Marco
>
>
>
--
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie
More information about the Corpora
mailing list