Corpora: bigram statistics package v0.5

ted pedersen tpederse at d.umn.edu
Tue Jun 4 19:10:22 UTC 2002


BSP is now NSP!

Version 0.5 of the Bigram Statistics Package is now available, and
has been renamed the N-gram Statistics Package (NSP v0.5).

NSP is an easy-to-use suite of Perl tools for counting and analyzing
word n-grams in text. It provides a number of standard tests of
association that can be used to identify word n-grams in large corpora,
and also allows users to easily implement other tests without knowing
very much about Perl at all.

Earlier versions of this package were known as the Bigram Statistics
Package (BSP v0.1, v0.3, v0.4) and dealt exclusively with word bigrams
(two word sequences). NSP v0.5 is backwards compatible with these
earlier versions, and adds supports for word n-grams.

Also new to v0.5 is support for user defined tokenization using regular
expressions, stop lists, and an extensive collection of test/sample scripts.

This is free software. Download it (or view the README) at:
http://www.d.umn.edu/~tpederse/nsp.html

Enjoy!
Ted

--
# Ted Pedersen                            http://www.d.umn.edu/~tpederse #
# Department of Computer Science                      tpederse at d.umn.edu #
# University of Minnesota Duluth                                         #
# Duluth, MN 55812                                        (218) 726-8770 #



More information about the Corpora mailing list