[Corpora-List] concordance program for large files

Paul Johnston Paul.A.Johnston at manchester.ac.uk
Wed Sep 3 06:41:24 UTC 2008


On Wednesday 03 September 2008 01:10:05 
jaime.hunt at studentmail.newcastle.edu.au wrote:
> Hello everyone,
>
> I was just wondering if you know of a good concordance program that deals
> with large files of over 1 million words that I might be able to use for my
> research. Has anyone had any experience with one? There are a few free ones
> on the internet, but they often don't deal with really large files.
>
> Regards,
> Jaime
>
> Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
> PhD (Linguistics) Candidate
> School of Humanities and Social Science
> McMullin Building
> University of Newcastle
> Callaghan
> NSW 2308
> Australia
>
> Ph. +61 (0)2 4921 5175
> Email: jaime.hunt at studentmail.newcastle.edu.au
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

For REALLY large file you could use the Cambridge-CMU Toolkit to build 
n-grams, word frequency lists vocabularies and the like.
Available at http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html and builds under 
MOST *nixes (well all the ones I've ever used)
It's not a pretty graphical tool but gets files into formats where they become 
usable.
It handles the BNC which is a lot bigger than 1 million words.
Word of warning however I have not used it much with Unicode encoded texts!

Paul


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list