[Corpora] [Corpora-List] Calculating statistical significant
Jacob Eisenstein
jacobe at gmail.com
Mon Nov 10 17:04:22 UTC 2014
>
> That depends very much on what your task looks like. It might be easiest
> – and is often done in computational linguistics – to carry out a ten-fold
> cross-validation and apply a paired t-test to the quality measure of your
> choice (e.g. F-score). To be precise, sample A would be the F-scores
> achieved by Sys 1 across the ten folds, and sample B the F-scores achieved
> by Sys 2 on _exactly the same_ folds (and in _exactly the same order_).
>
This seems overly conservative to me. Suppose there is a lot of variance
across the folds, but system 1 does exactly 0.5% better than system 2 on
every fold. It seems like what you want to do is a t-test on the difference
in performance.
That said, there are definitely machine learning / stats papers that argue
against computing variance across cross-validation folds. I can't find the
exact reference I'm thinking of, but the related work section of Demsar
(JMLR 2006) seems like a useful starting point.
http://machinelearning.wustl.edu/mlpapers/paper_files/Demsar06.pdf
For a tagging task evaluated in terms of accuracy, you can apply McNemar's
> test to the output of the two systems. The samples correspond to all
> tokens in the test set, and the observed values are (i) whether Sys 1 is
> correct on this token and (ii) whether Sys 2 is correct on this token.
>
One could also apply a sign test in this case, which I personally find
easier to understand. The trouble is that you may not have access to Sys
2's outputs on each instance (suppose you only know its reported accuracy);
in this case, you can't apply the sign test or McNemar's test.
-Jacob
>
> Hope this helps,
> Stefan
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141110/a5c1bfee/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list