Corpora: statistics for the cash-strapped

Marco Antonio Esteves da Rocha marcor at cce.ufsc.br
Thu Feb 22 22:00:09 UTC 2001


Dear all,

I am about to start teaching a posgrad course on statistics for language
studies. The audience will be mostly made up by linguists and EFL
teachers.

In an effort to include lab practice in course work, I began checking
prices for statistical packages such as SPSS ans SAS. Although there
are some affordable versions, they have too many limitations. The
wide-coverage versions cost around US$ 900, that's out of the question for
our department.

I have looked into Dan Melamed's page, where I found a lot of useful
staff in Perl for free. Extremely helpful for my personal research work,
but for this course I would need more of the standard statistical
procedures, as some of the people taking the course are not into NLP or
parallel corpus manipulation. Procedures I have in mind are significance
tests, measures of correlation in general, the various types of ANOVA,
linear regression, generalised linear models, loglinear analysis,
clustering, linear discriminant analysis and factor analysis. As well as
the basics of central tendency and dispersion measures, but that is not so
difficult with a calculator.

I wonder if someone has produced free or cheap libraries in Perl or any
other language that could be downloaded for use in this course. Perhaps I
underestimate the possibilities of materials in Melamed's page, because
descriptions are somewhat succint and I have not had the time to go
through each and every program.

I am not that good in Perl or programming in general anyway, my background
is in linguistics (I'm a lot better now, but still a long way to go) so
that I may not have realised the full range of possibilities behind the
code in Melamed's libraries.

If anyone can help with possible adaptations of Dan Melamed's materials
or of any other freely available code, I am ready to face it. Addresses of
web pages with information on how to use Perl for this kind of purpose
would also be helpful. I say Perl because I know the compiler here works
properly, but I am ready to consider something else.

Thank you,

Marco Rocha



More information about the Corpora mailing list