[Corpora-List] calculation problem

Alexander Osherenko osherenko at gmx.de
Fri Oct 21 09:20:28 UTC 2005


Dear Marco,

I tried to give the simplest explanation.

You say "Bad sampling" is a problem. I don't argue, but in bootstrapping you
must make some considerations if you want to get further. Such
considerations are - Sampling is good, I take the simplest distribution and
calculate the results.

If you are not satisfied with system results (actually also a problem - what
can be considered to be a good measure of system quality?) you can always
choose another distribution and increase amount of samples.

Cheers,

Alexander

P.S. BTW, I don't think that Helene wanted a thorough mathematical
explanation of her case.

> --- Ursprüngliche Nachricht ---
> Von: Marco Baroni <baroni at sslmit.unibo.it>
> An: Alexander Osherenko <osherenko at gmx.de>
> Kopie: CORPORA at UIB.NO
> Betreff: Re: [Corpora-List] calculation problem
> Datum: Thu, 20 Oct 2005 19:20:41 +0200
> 
> Dear Alexander,
> 
> I'm a bit confused...
> 
> > if you assume that occurences in your corpus are distributed uniformly
> > (actually the simplest probability distribution ever), you can take this
> 100
> > number
> >
> > Otherwise, if you use another distribution that better describes
> behaviour
> > of the occurences it will influence the number of occurences in the 1
> > million corpus and will be probably not 100.
> >
> 
> Isn't the problem rather one  of (non-random) sampling, and not a matter
> of
> the assumed distribution (which, as far as I can tell, is not assumed to
> be
> uniform)?
> 
> Regards,
> 
> Marco
> 
> 
> 

-- 
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie



More information about the Corpora mailing list