[Corpora-List] The standard size of splitting the dataset

Thu Jun 27 12:58:13 UTC 2013

Hi all,

Has anyone came across the standard size of splitting the dataset
into (training, development and test) in supervised learning? I mean what
is the typical percentage size for each subset especially for sequence
labelling tasks, e.g. POS and NER?

I wonder if it is something like 60% training, 20% development and 20% test?

Many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130627/6bebc52d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora