[Corpora-List] N-gram string extraction

Christer Johansson christer.johansson at lili.uib.no
Wed Aug 28 11:27:32 UTC 2002


andrius at ccl.bham.ac.uk wrote:
 ...
> It's running for the 7th day now.
>


My guess:

Somewhere a sort operation is needed. I guess that sort operation is
implemented in a "simple for the programmer" way. Which means that it is
likely somewhere between n*n and n*n*n in time. Unix sort uses more efficient
algorithms that are more likely n*log n.   One million keys would take
between 10^12 and 10^18 operations in the slow versions, in the fast sort
version it is 10^6*log(2?) of 10^6; is it somewhere near 20*10^6? This
is most likely where your problem is.

   /Christer



More information about the Corpora mailing list