[Corpora] [Corpora-List] Calculating statistical significant

Stefan Evert stefanML at collocations.de
Mon Nov 10 18:06:54 UTC 2014


> If you have the outputs of both systems on each instance, you may try bootstrap resampling, as done here: http://genomebiology.com/2008/9/S2/S2

Indeed, if you have the full system outputs and if you believe that your test data form a random sample from the population of interest, you can apply bootstrap resampling in order to obtain confidence intervals for non-trivial evaluation criteria such as P, R and F-score.

If you just want to know whether there is a significant difference between the two systems, you can simply apply McNemar's test.  The bootstrap resampling – if implemented correctly – will give you the same answer at much greater computational cost.

If you're satisfied with accuracy as an evaluation criterion, you can also compute (binomial) confidence intervals for the two systems directly without bootstrapping.  A confidence interval for the difference in accuracy can be derived from McNemar's test – I've implemented something along those lines for my PhD thesis long, long ago.

Best,
Stefan



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list