[Corpora-List] Free n-gram Software Released

William H. Fletcher fletcher at usna.edu
Mon Sep 30 21:39:32 UTC 2002


The recent flurry of discussion on n-gram software inspired me to revisit a
project from last year.   I reprogrammed kfNgram using aspects of the
"suffix array" approach described by Mikio Yamamoto and Kenneth W. Church
and further developed by Chunyu Kit and Yorick Wilks.  The result was a
quantum leap in performance which makes it useful even for large corpora.
(It indexes the 25 million word CETENFolha corpus announced here last week
in about 10 minutes on my Pentium III machine with  800 MHz processor and
256 MB RAM, then cranks out n-gram files in under a minute.)

kfNgram supports user-defined character sets and sort orders, and its GUI
(graphical user interface) makes it accessible even to casual users.

This free Windows program is available at
http://miniappolis.com/KWiCFinder/kfNgramHelp.html
Suggestions and comments on its usability and performance will be  greatly
appreciated.

Bill Fletcher

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  William H. Fletcher              410.293.6362 [voice]
  Associate Professor, German & Spanish   410.293.2729 [fax]
  Language Studies Department
  US Naval Academy
  589 McNair Road
  Annapolis, MD 21402 - 5030

  fletcher at usna.edu
  http://www.usna.edu/LangStudy/
  http://kwicfinder.com/
  http://miniappolis.com/

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



More information about the Corpora mailing list