[Corpora-List] Handling a Large Text Archive

Laurence Anthony anthony0122 at gmail.com
Wed Jan 4 15:48:39 UTC 2012


On Wed, Jan 4, 2012 at 11:57 PM, True Friend <true.friend2004 at gmail.com>wrote:

> Hi
> I've a large text archive of 100+ million words in utf8 encoding
> (non-English text archive). Sometimes i need to get concordance, or word
> list but its size creates problem. I've tried AntConc (always hangs when I
> open the text files in it), as well as TextSTAT (works fine for concordance
> usually but hangs when a word list task is given). Any good free
> alternative to handle big text archives? Or any efficient way to handle
> such a large collection?
> Thanks a lot for taking time and reading this email. Your response will be
> highly appreciated.
> Regards
>
>
Hi,

AntConc is really designed for just a few million-word corpora. Also, it
assumes that each corpus file is quite small. That's why you will find it
hangs on 100+ word corpora. Saying that, I'm now working on a new version
that will (hopefully) handle 100+ corpora smoothly. I'll announce it here
when its ready.

Laurence Anthony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120105/109a6bda/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list