Corpora: stats summary

Marco Antonio Esteves da Rocha marcor at cce.ufsc.br
Fri Mar 2 00:11:52 UTC 2001


Dear    UW PICO(tm) 2.3                    File: summary

Dear all,

Here goes the summary of free or cheap statistical resources mentioned by
list members in response to my query:

*************************************************

There's a student version of SPSS that is pretty much like the $900 (US)
version but lacks some of the more advanced statistical tests (e.g.
loglinear analyses). I've used it and it's quite good. I forget the exact
price, but it's under $100 (US).

-Charles Meyer, UMass-Boston

*************************************************

>>From Cam.Fordyce at lhsl.com Sat Feb 24 20:37:56 2001

Hi Marco,

You could look at www.perl.com or any other site that has access to CPAN,
an archive of modules. There you will find
the following modules that might be of use.

Cam

Here is the listing of some of the statistics-related modules listed at
the
above site.


      Math::CDF -- Module
      Math::CDF gives probabilities and quantiles from several statistical
      probability functions, including the normal distribution, t-dist,
      F-dist and others. Non-centrality functions are available for some
      distributions. The module is an interface to the DCDFLIB library of
      C programs. The DCDFLIB source is included with the Math::CDF module
      with permission of its authors.
Statistics::ChiSquare -- Module
      How random is your data? The Chi Square test tells you.
      Statistics::Descriptive -- Module
      Commonly used statistical methods: mean, variance, standard
      deviation, least squares fit, and so on.
      Statistics::LTU -- Module
      A module for manipulating Linear Threshold Units, also called
      perceptrons, which are neural networks with no hidden layers.
      Statistics::MaxEntropy -- Module
      Object-oriented implementation of Generalised Iterative Scaling
      algorithm, Improved Iterative Scaling algorithm, and Feature
      Induction algorithm for inducing maximum entropy probability
      distributions.
      Statistics::OLS -- Module
      Statistics::OLS (Ordinary Least Squares) computes the estimated
      slope and intercept of the regression line, their T-statistics, R
      squared, standard error of the regression and the Durbin-Watson
      statistic. It can also return the residuals.
      Statistics::ROC -- Module
      Statistics::ROC (receiver-operator-characteristic) determines the
ROC
      curve and its nonparametric confidence bounds for data categorized
      into two groups. A ROC curve shows the relationship of probability
of
      false alarm (x-axis) to probability of detection (y-axis) for a
      certain test. Expressed in medical terms: the probability of a
      positive test, given no disease> to the probability of a positive
      test, given disease. The ROC curve may be used to determine an
      optimal cutoff point for the test.

****************************************************************


Hi,
I certainly wish you the best of luck with your
project - I think statistical work is the way to go.  :)

I've heard of, but not yet used, a free statistical
programming package called R.  It's a freeware
counterpart to the very popular S and S-Plus
stats programming packages.  Here's a URL:
 http://www.r-project.org/


************************************************************

>>From henning.reetz at uni-konstanz.de Sat Feb 24 20:38:44 2001

Hi Marco,

you should take a look at the JMP package - it's from SAS (we pay
about the equivalent of $50 for our university-related licence; the
normal price is something like $500 - check whether you can get it
via a research institution related to you for a lower price. - There
is also a student's version JMP IN ) and it has a graphics
user-interface (on Mac and Windows - I don't know about UNIX/LINUX
versions). It is a general purpose system with lots of graphic
representation (you can AND/OR graphically), has a very complex ANOVA
(can handle many more things than SPSS) and it's fast and reliable
(SPSS runs any ANOVA, JMP barks if there are linear dependencies in
the data) -- The user-interface is okay, once you mastered the
sometimes strange concepts (e.g., they use a post-fix language for
their logical terms) and you can also write scripts.

I use it for many more things than statistical evaluation, for
example you can formulate things like "select all 3-syllable words
from the CELEX database and sort them by the medial syllable" (once
you have read in the CELEX database).

The URL is http://www.jmpdiscovery.com/ you can also download a demo

The only thing I don't know is whether you can get it somehow for a
price low as $50 - but first take a look whether it would be
interesting at all. -- It is much easiert to handle than anything
else and no comparision to the normal SAS package.

Henning Reetz

******************************************************

From: "TOYOSHIMA,Masayuki" <mtoyo at aa.tufs.ac.jp>

I have written 3 tests in perl, i.e.
        Chi-square test     (table-t.pl)
        T-test              (avrg-t.pl)
        test of proportions (ratio-t.pl)
http://jcs.aa.tufs.ac.jp/mtoyo/stats/stats-pl.zip

I am sorry to say that all the documentation is in Japanese.
But the perl scripts themselves are in and perl :-) with comments in
English.

*********************************************************

>>From manning at cs.stanford.edu Sat Feb 24 20:43:17 2001

Marco,

You could try R, a totally free implementation of the S statistics
programming language:

        http://www.r-project.org/

R has everything you need.  The main possible disadvantage of R (or S)
versus packages like SPSS/SAS is that they are much more programming
languages customized for statisticians rather than statistics
packages.  So, they require more technical competence on the part of
users.

********************************************************

From: George Foster <foster at IRO.UMontreal.CA>

Hi,

Lispstat is good, fun, and free, though not particularly intended for NLP.
Have a look at:

http://www.stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html

George

********************************************************

From: Paul Clough <p.clough at dcs.shef.ac.uk>

Hi Marco,

Have you tried the Perl CPAN pages? A small number of statistical
functions
can be found here:

http://www.perl.com/reference/query.cgi?statistics

Also, if you just want a free data analysis/statistical package to use,
have
you tried R?

http://cran.r-project.org/doc/manuals/R-intro.pdf

http://cran.r-project.org/

Paul.

*********************************************

From: Patrick Ruch <ruch at dim.hcuge.ch>

For all the above needs, we use S-PLUS, they have very nice edu prices,
about 20$ for students (at least, it is what in costs in the Geneva
University).
This is more a matter of marketing, and I do not know if this price is the
same
for any University, but you can maybe get in touch with S-PLUS sellers, to
get comparable prices !

********************************************

From: John Aitchison <jaitchis at lisp.com.au>

R (also called gnu S) is FREE, runs on a variety of platforms, has a huge
range of procedures....

www.r-project.org

I use it and love it. Forget SPSS and SAS and SPLUS and .. well, you just
need R

*************************************************

From: Mike Scott <lexically at btinternet.com>

Oi Marco Antonio

http://uk.torry.net/statistic.htm

tem componentes Dephi pra estatistica, naturalmente so pra quem programa
em
Pascal. Talvez algum seja util... os S sao source included, F free, etc.

[] Mike Scott

Mike is saying, in Portuguese, that there are Dephi (perhaps Delphi)
components for statistics in the address above, naturally for those who
program in Pascal.
Those marked S are source included and those marked F free, etc.
*****************************************************

From: "Melamed, Dan" <Dan.Melamed at westgroup.com>

Much of what you need is here:


http://www.acm.org/~perlman/statinfo.html

IDM

*************************************************

Thanks to all those people who took the time to respond. The R system got
more votes than any other solution (no hanging chads). I have already
downloaded it and it seems really good. I will be testing Lispstat soon.
Other solutions will be looked into next. I decided to post the summary of
mentioned resources before finishing testing, as this will obviously take
a long time.

Cheers,

Marco



More information about the Corpora mailing list