[Corpora-List] Question about Cross-Validation on a Multi-label Corpus

Miles Osborne miles at inf.ed.ac.uk
Thu Sep 3 10:10:47 UTC 2009


"Cross-validation" is simply a strategy for reducing variance in your
reported results.  Part of this variance will be due to under and over
represenation of example-label pairs in any given train/test split, so
I'd not really worry about it.  Just report the averaged results.

On a different subject, I really wish people would do cross-validation
more often then they do.  Significance tests are not really a
substitute, especially when assumptions are violated and one can
always select some other test which produce the intended shiny
results.

Miles

> Message: 1
> Date: Wed, 2 Sep 2009 12:48:35 +0100
> From: Tim Mike <tm0826 at gmail.com>
> Subject: [Corpora-List] Question about Cross-Validation on a
>        Multi-label     Corpus
> To: Corpora at uib.no
>
> Hi All,
>
> I am dealing with a multi-label dataset, and would like to do
> cross-validation on it. Could you please tell me usually how we can split a
> multi-label corpus into training and validating parts? I planned to consider
> each label combination individually and split the samples having the same
> combined label into two parts, but some label combinations onle have one
> sample. In this situation, what can I do? Is there a reasonable and commonly
> used way to split a multi-label corpus? Any comment would be much
> appreciated.
>
> Thanks,
>
> Tim

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list