[Corpora-List] concordance program for large files

Pierre Nugues Pierre.Nugues at cs.lth.se
Wed Sep 3 07:11:20 UTC 2008


You may try this program in Perl (10 lines):
http://www.cs.lth.se/EDA171/Programs/ch02/concord_perl.pl

Pierre
--
Pierre Nugues, Lunds Tekniska Högskola, Institutionen för  
datavetenskap, Box 118, S-221 00 Lund, Suède.
Tél. (0046) 46 222 96 40, http://www.cs.lth.se/~pierre
Visiteurs: Lunds Tekniska Högskola, E-huset, rum 4134A, Ole Römers väg  
3, S-223 63 Lund.
Mon livre/My book: http://www.cs.lth.se/home/Pierre_Nugues/ilppp/


Le 3 sept. 08 à 08:41, Paul Johnston a écrit :

> On Wednesday 03 September 2008 01:10:05
> jaime.hunt at studentmail.newcastle.edu.au wrote:
>> Hello everyone,
>>
>> I was just wondering if you know of a good concordance program that  
>> deals
>> with large files of over 1 million words that I might be able to  
>> use for my
>> research. Has anyone had any experience with one? There are a few  
>> free ones
>> on the internet, but they often don't deal with really large files.
>>
>> Regards,
>> Jaime
>>
>> Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
>> PhD (Linguistics) Candidate
>> School of Humanities and Social Science
>> McMullin Building
>> University of Newcastle
>> Callaghan
>> NSW 2308
>> Australia
>>
>> Ph. +61 (0)2 4921 5175
>> Email: jaime.hunt at studentmail.newcastle.edu.au
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
> For REALLY large file you could use the Cambridge-CMU Toolkit to build
> n-grams, word frequency lists vocabularies and the like.
> Available at http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html and  
> builds under
> MOST *nixes (well all the ones I've ever used)
> It's not a pretty graphical tool but gets files into formats where  
> they become
> usable.
> It handles the BNC which is a lot bigger than 1 million words.
> Word of warning however I have not used it much with Unicode encoded  
> texts!
>
> Paul
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list