[Corpora-List] calculation problem

Alexander Osherenko osherenko at gmx.de
Thu Oct 20 13:36:14 UTC 2005


Hello Helene,

if you assume that occurences in your corpus are distributed uniformly
(actually the simplest probability distribution ever), you can take this 100
number

Otherwise, if you use another distribution that better describes behaviour
of the occurences it will influence the number of occurences in the 1
million corpus and will be probably not 100.

Cheers,

Alexander

> --- Ursprüngliche Nachricht ---
> Von: "STENGERS, Helene" <Helene.Stengers at ehb.be>
> An: CORPORA at UIB.NO
> Betreff: [Corpora-List] calculation problem
> Datum: Wed, 19 Oct 2005 14:14:55 +0200 (Romance (zomertijd))
> 
> 
>  
>  
> Hello dear list members,
>  
>  
> I have an arithmetic question. If a particular expression occurs let's
> say 500 times in a 5 million word corpus, can I assume that there will
> be 100 of these expressions in a one million corpus or is there a
> statistical (probability)formula  which I should apply?
>  
> Cheers,
>  
> Helene Stengers
> 
> 

-- 
10 GB Mailbox, 100 FreeSMS/Monat http://www.gmx.net/de/go/topmail
+++ GMX - die erste Adresse für Mail, Message, More +++



More information about the Corpora mailing list