[Corpora] [Corpora-List] Calculating statistical significant

Mon Nov 10 18:01:22 UTC 2014

> This seems overly conservative to me. Suppose there is a lot of variance across the folds, but system 1 does exactly 0.5% better than system 2 on every fold. It seems like what you want to do is a t-test on the difference in performance.

That's the _paired_ t-test I suggested.

> That said, there are definitely machine learning / stats papers that argue against computing variance across cross-validation folds. I can't find the exact reference I'm thinking of, but the related work section of Demsar (JMLR 2006) seems like a useful starting point.
> http://machinelearning.wustl.edu/mlpapers/paper_files/Demsar06.pdf

Thanks for the interesting reference.  I wonder in what sense variance is underestimated by the cross-validation procedure (except wrt. to the dependency of the results on the training data, but that's something that is usually ignored in machine learning).

> One could also apply a sign test in this case, which I personally find easier to understand. The trouble is that you may not have access to Sys 2's outputs on each instance (suppose you only know its reported accuracy); in this case, you can't apply the sign test or McNemar's test.

Sign tests are intended for a situation where you have numerical (or at least ordinal) measurements.  If you enforce this by coding e.g. a correct tag as 1 and a wrong tag as 0, then the sign test should give you exactly the same result as McNemar's test.

Best,
Stefan
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora