Corpora: stats summary
Marco Antonio Esteves da Rocha
marcor at cce.ufsc.br
Fri Mar 2 00:11:52 UTC 2001
Dear UW PICO(tm) 2.3 File: summary
Dear all,
Here goes the summary of free or cheap statistical resources mentioned by
list members in response to my query:
*************************************************
There's a student version of SPSS that is pretty much like the $900 (US)
version but lacks some of the more advanced statistical tests (e.g.
loglinear analyses). I've used it and it's quite good. I forget the exact
price, but it's under $100 (US).
-Charles Meyer, UMass-Boston
*************************************************
>>From Cam.Fordyce at lhsl.com Sat Feb 24 20:37:56 2001
Hi Marco,
You could look at www.perl.com or any other site that has access to CPAN,
an archive of modules. There you will find
the following modules that might be of use.
Cam
Here is the listing of some of the statistics-related modules listed at
the
above site.
Math::CDF -- Module
Math::CDF gives probabilities and quantiles from several statistical
probability functions, including the normal distribution, t-dist,
F-dist and others. Non-centrality functions are available for some
distributions. The module is an interface to the DCDFLIB library of
C programs. The DCDFLIB source is included with the Math::CDF module
with permission of its authors.
Statistics::ChiSquare -- Module
How random is your data? The Chi Square test tells you.
Statistics::Descriptive -- Module
Commonly used statistical methods: mean, variance, standard
deviation, least squares fit, and so on.
Statistics::LTU -- Module
A module for manipulating Linear Threshold Units, also called
perceptrons, which are neural networks with no hidden layers.
Statistics::MaxEntropy -- Module
Object-oriented implementation of Generalised Iterative Scaling
algorithm, Improved Iterative Scaling algorithm, and Feature
Induction algorithm for inducing maximum entropy probability
distributions.
Statistics::OLS -- Module
Statistics::OLS (Ordinary Least Squares) computes the estimated
slope and intercept of the regression line, their T-statistics, R
squared, standard error of the regression and the Durbin-Watson
statistic. It can also return the residuals.
Statistics::ROC -- Module
Statistics::ROC (receiver-operator-characteristic) determines the
ROC
curve and its nonparametric confidence bounds for data categorized
into two groups. A ROC curve shows the relationship of probability
of
false alarm (x-axis) to probability of detection (y-axis) for a
certain test. Expressed in medical terms: the probability of a
positive test, given no disease> to the probability of a positive
test, given disease. The ROC curve may be used to determine an
optimal cutoff point for the test.
****************************************************************
Hi,
I certainly wish you the best of luck with your
project - I think statistical work is the way to go. :)
I've heard of, but not yet used, a free statistical
programming package called R. It's a freeware
counterpart to the very popular S and S-Plus
stats programming packages. Here's a URL:
http://www.r-project.org/
************************************************************
>>From henning.reetz at uni-konstanz.de Sat Feb 24 20:38:44 2001
Hi Marco,
you should take a look at the JMP package - it's from SAS (we pay
about the equivalent of $50 for our university-related licence; the
normal price is something like $500 - check whether you can get it
via a research institution related to you for a lower price. - There
is also a student's version JMP IN ) and it has a graphics
user-interface (on Mac and Windows - I don't know about UNIX/LINUX
versions). It is a general purpose system with lots of graphic
representation (you can AND/OR graphically), has a very complex ANOVA
(can handle many more things than SPSS) and it's fast and reliable
(SPSS runs any ANOVA, JMP barks if there are linear dependencies in
the data) -- The user-interface is okay, once you mastered the
sometimes strange concepts (e.g., they use a post-fix language for
their logical terms) and you can also write scripts.
I use it for many more things than statistical evaluation, for
example you can formulate things like "select all 3-syllable words
from the CELEX database and sort them by the medial syllable" (once
you have read in the CELEX database).
The URL is http://www.jmpdiscovery.com/ you can also download a demo
The only thing I don't know is whether you can get it somehow for a
price low as $50 - but first take a look whether it would be
interesting at all. -- It is much easiert to handle than anything
else and no comparision to the normal SAS package.
Henning Reetz
******************************************************
From: "TOYOSHIMA,Masayuki" <mtoyo at aa.tufs.ac.jp>
I have written 3 tests in perl, i.e.
Chi-square test (table-t.pl)
T-test (avrg-t.pl)
test of proportions (ratio-t.pl)
http://jcs.aa.tufs.ac.jp/mtoyo/stats/stats-pl.zip
I am sorry to say that all the documentation is in Japanese.
But the perl scripts themselves are in and perl :-) with comments in
English.
*********************************************************
>>From manning at cs.stanford.edu Sat Feb 24 20:43:17 2001
Marco,
You could try R, a totally free implementation of the S statistics
programming language:
http://www.r-project.org/
R has everything you need. The main possible disadvantage of R (or S)
versus packages like SPSS/SAS is that they are much more programming
languages customized for statisticians rather than statistics
packages. So, they require more technical competence on the part of
users.
********************************************************
From: George Foster <foster at IRO.UMontreal.CA>
Hi,
Lispstat is good, fun, and free, though not particularly intended for NLP.
Have a look at:
http://www.stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html
George
********************************************************
From: Paul Clough <p.clough at dcs.shef.ac.uk>
Hi Marco,
Have you tried the Perl CPAN pages? A small number of statistical
functions
can be found here:
http://www.perl.com/reference/query.cgi?statistics
Also, if you just want a free data analysis/statistical package to use,
have
you tried R?
http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/
Paul.
*********************************************
From: Patrick Ruch <ruch at dim.hcuge.ch>
For all the above needs, we use S-PLUS, they have very nice edu prices,
about 20$ for students (at least, it is what in costs in the Geneva
University).
This is more a matter of marketing, and I do not know if this price is the
same
for any University, but you can maybe get in touch with S-PLUS sellers, to
get comparable prices !
********************************************
From: John Aitchison <jaitchis at lisp.com.au>
R (also called gnu S) is FREE, runs on a variety of platforms, has a huge
range of procedures....
www.r-project.org
I use it and love it. Forget SPSS and SAS and SPLUS and .. well, you just
need R
*************************************************
From: Mike Scott <lexically at btinternet.com>
Oi Marco Antonio
http://uk.torry.net/statistic.htm
tem componentes Dephi pra estatistica, naturalmente so pra quem programa
em
Pascal. Talvez algum seja util... os S sao source included, F free, etc.
[] Mike Scott
Mike is saying, in Portuguese, that there are Dephi (perhaps Delphi)
components for statistics in the address above, naturally for those who
program in Pascal.
Those marked S are source included and those marked F free, etc.
*****************************************************
From: "Melamed, Dan" <Dan.Melamed at westgroup.com>
Much of what you need is here:
http://www.acm.org/~perlman/statinfo.html
IDM
*************************************************
Thanks to all those people who took the time to respond. The R system got
more votes than any other solution (no hanging chads). I have already
downloaded it and it seems really good. I will be testing Lispstat soon.
Other solutions will be looked into next. I decided to post the summary of
mentioned resources before finishing testing, as this will obviously take
a long time.
Cheers,
Marco
More information about the Corpora
mailing list