[Corpora-List] Handling a Large Text Archive

True Friend true.friend2004 at gmail.com
Wed Jan 4 14:57:07 UTC 2012


Hi
I've a large text archive of 100+ million words in utf8 encoding
(non-English text archive). Sometimes i need to get concordance, or word
list but its size creates problem. I've tried AntConc (always hangs when I
open the text files in it), as well as TextSTAT (works fine for concordance
usually but hangs when a word list task is given). Any good free
alternative to handle big text archives? Or any efficient way to handle
such a large collection?
Thanks a lot for taking time and reading this email. Your response will be
highly appreciated.
Regards

-- 
*Muhammad Shakir Aziz* *محمد شاکر عزیز*
*Master in Applied Linguistics
Translator, Course Developer, Linguist for Urdu, Punjabi and English*
Urdu:- http://awaz-e-dost.blogspot.com/
English:- http://linguisticslearner.blogspot.com/
Facebook:- http://www.facebook.com/truefriend2004
Skype:- true_friend2004
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120104/2d355034/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list