[Corpora-List] robust statistics

Justin Washtell lec3jrw at leeds.ac.uk
Fri Mar 26 15:28:14 UTC 2010


Dear Rainer,

I would second this observation - robust stats have not been explored as much as they ought, especially considering that they may be particularly applicable to language given how little we know about it.

Certainly many of us in NLP, given our backgrounds, are not familiar with these techniques and their advantages (one still sees plenty of questionable applications of "standard" statistics). For example, I have sometimes found myself challenging the use of linear correlation measures where the assumption of a linear relationship is at best questionable, and I now personally tend to default to using more robust rank correlations measures unless there is a very good reason to do otherwise.

There do exist however some real problems with these methods. For example, corpus-driven NLP often deals with huge datasets. Whereas one can efficiently calculate running means and variances over huge datasets using very little resources, this may not be easy (or even possible) to do with their robust equivalents (I know of no method of incrementally calculating a median). I dare say though that there are plenty of other advantages/disadvantages. I would be very interested in seeing some kind of review of this!

Justin Washtell
University of Leeds

________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Rainer Ottmueller [rainer.ottmueller at googlemail.com]
Sent: 26 March 2010 02:23
To: CORPORA at uib.no
Subject: [Corpora-List] robust statistics

Dear all!

My first impression is that Robust statistics is not used in all NLP.
Please confirm or decline.

Rainer

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list