Corpora: bigram statistics package (v0.1)

ted pedersen tpederse at d.umn.edu
Thu Dec 7 22:36:15 UTC 2000


I'd like to announce the availability of the Bigram Statistics Package.
This is an easy to use tool for counting and analyzing bigram frequencies
in text. It is free software (written in Perl) that you can download from:

http://www.d.umn.edu/~tpederse/code.html

The following statistical tests are currently supported:

Fisher's exact test, the likelihood ratio, Pearson's chi squared test,
the Dice Coefficient, and Mutual Information

BSP also provides:

1) A tool for comparing ranked lists of bigrams from two different
tests. This allows you to measure the difference in the rankings
obtained from test X and test Y for a given corpus.

2) The ability to easily implement and incorporate your own tests into the
package. The package is designed so you can do so with minimal knowledge
of Perl and our underlying implementation.

We would be very interested to hear if you find this code useful (or not).
This is an on-going project so suggestions for improvements, fixes, etc
would be much appreciated.

Enjoy!

Ted

--
# Ted Pedersen                            http://www.d.umn.edu/~tpederse #
# Department of Computer Science                      tpederse at d.umn.edu #
# University of Minnesota Duluth                                         #
# Duluth, MN 55812                                        (218) 726-8770 #



More information about the Corpora mailing list