[Corpora-List] Keywords Generator
Trevor Jenkins
trevor.jenkins at suneidesis.com
Mon Feb 18 17:47:48 UTC 2008
On Mon, 18 Feb 2008, True Friend <true.friend2004 at gmail.com> wrote:
> Trevor Jenkins: Sorry I forgot to mention the size it was in words,
> 1.9million words. I also thought that large amount of data is the
> reason.
Oh okay. So roughly around 8Mb to 12Mb based on an average (English) word
length of say 6 characters. I ran my pipe of filters across the Jane
Austen texts including the juvenalia (which came to about 11Mb); no
problem at all other than that all the words were stuffed into one result
file. On a MacBook Pro with Intel Dual Core processor it took a matter of
seconds to create the (2.5Mb) result file.
Personally I don't consider 1.9million words to be large. I once had a
junior programmer who managed to stuff an 8Mb sentence into one record.
Regards, Trevor
<>< Re: deemed!
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list