Corpora: re: corpus indexing program
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Wed Jun 5 12:55:57 UTC 2002
On Sat, Jun 01, 2002 at 12:53:16PM +0200, E.S. wrote:
> Can anyone direct me to a corpus indexing program that does fast
> searches. I have dabbled in Wordsmith and Winconcord for Windows, but
> neither does a complete index of my entire database of text,
> approximately 2 GB, and both seem to take about 20 minutes on a Pentium
> 233 for one search.
The SARA program developed for the BNC (which is slightly more than 2
Gb of text) would handle this job easily. The success with which it
would provide superior searching abilities to your current combination
of tools depends on how the text in your corpus is organized. If you
would like to send me a few sample files, I'd be glad to test it out
for you.
We are currently working on a major new version of the SARA program,
which will include several enhancements to the indexer. Any strong
views people have on how indexing of large corpora should be specified
would be gratefully received and I hope to be demonstrating the new
version at TALC next month.
Lou Burnard
----- End forwarded message -----
More information about the Corpora
mailing list