[Corpora-List] Keywords Generator

Trevor Jenkins trevor.jenkins at suneidesis.com
Mon Feb 18 17:23:19 UTC 2008


On Mon, 18 Feb 2008, True Friend <true.friend2004 at gmail.com> wrote:

> Hi Sir
> Tried your script but ........ it has some problems. Probably the large
> size of txt files was the reason. Corpus A was about 1.9 million and
> corpus B was almost as A.

I'll leave Alex to comment on the use of his script but I wonder what you
are reporting here with these numbers. Do you 1.9 million documents,
words, characters.

The texts I used for my pipe-line script are all about 1.9Mb (1.9 million
characters) in size. The individual filters I used do not have a problem
processing that amount of data; I've processed larger stuff with the same
piple-line.

It might be that Alex's quick script can't cope with the volumes of
information you are throwing at it. And either you'll have to use
something else or to improve the script to cope with large volumes.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list