[Corpora-List] Keywords Generator

Trevor Jenkins trevor.jenkins at suneidesis.com
Mon Feb 18 17:47:48 UTC 2008


On Mon, 18 Feb 2008, True Friend <true.friend2004 at gmail.com> wrote:

> Trevor Jenkins: Sorry I forgot to mention the size it was in words,
> 1.9million words. I also thought that large amount of data is the
> reason.

Oh okay. So roughly around 8Mb to 12Mb based on an average (English) word
length of say 6 characters. I ran my pipe of filters across the Jane
Austen texts including the juvenalia (which came to about 11Mb); no
problem at all other than that all the words were stuffed into one result
file. On a MacBook Pro with Intel Dual Core processor it took a matter of
seconds to create the (2.5Mb) result file.

Personally I don't consider 1.9million words to be large. I once had a
junior programmer who managed to stuff an 8Mb sentence into one record.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list