[Corpora-List] Free n-gram Software Released
William H. Fletcher
fletcher at usna.edu
Mon Sep 30 21:39:32 UTC 2002
The recent flurry of discussion on n-gram software inspired me to revisit a
project from last year. I reprogrammed kfNgram using aspects of the
"suffix array" approach described by Mikio Yamamoto and Kenneth W. Church
and further developed by Chunyu Kit and Yorick Wilks. The result was a
quantum leap in performance which makes it useful even for large corpora.
(It indexes the 25 million word CETENFolha corpus announced here last week
in about 10 minutes on my Pentium III machine with 800 MHz processor and
256 MB RAM, then cranks out n-gram files in under a minute.)
kfNgram supports user-defined character sets and sort orders, and its GUI
(graphical user interface) makes it accessible even to casual users.
This free Windows program is available at
http://miniappolis.com/KWiCFinder/kfNgramHelp.html
Suggestions and comments on its usability and performance will be greatly
appreciated.
Bill Fletcher
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
William H. Fletcher 410.293.6362 [voice]
Associate Professor, German & Spanish 410.293.2729 [fax]
Language Studies Department
US Naval Academy
589 McNair Road
Annapolis, MD 21402 - 5030
fletcher at usna.edu
http://www.usna.edu/LangStudy/
http://kwicfinder.com/
http://miniappolis.com/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
More information about the Corpora
mailing list