[Corpora-List] The standard size of splitting the dataset

Scott Crossley sacrossley at gmail.com
Thu Jun 27 14:22:18 UTC 2013


Check out Witten, Frank, and Hall, Data Mining: Practical Machine Learning Tools and Techniques.

They give good advice.

http://www.cs.waikato.ac.nz/ml/weka/book.html

On Jun 27, 2013, at 8:58 AM, Jack Alan wrote:

> Hi all,
> 
> Has anyone came across the standard size of splitting the dataset into (training, development and test) in supervised learning? I mean what is the typical percentage size for each subset especially for sequence labelling tasks, e.g. POS and NER?
> 
> I wonder if it is something like 60% training, 20% development and 20% test?
> 
> Many thanks
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

Scott Crossley, Ph.D.
Department of Applied Linguistics/ESL
Georgia State University
http://www2.gsu.edu/~wwwesl/scottcrossleybio.html


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list