[Corpora-List] The standard size of splitting the dataset
Scott Crossley
sacrossley at gmail.com
Thu Jun 27 14:22:18 UTC 2013
Check out Witten, Frank, and Hall, Data Mining: Practical Machine Learning Tools and Techniques.
They give good advice.
http://www.cs.waikato.ac.nz/ml/weka/book.html
On Jun 27, 2013, at 8:58 AM, Jack Alan wrote:
> Hi all,
>
> Has anyone came across the standard size of splitting the dataset into (training, development and test) in supervised learning? I mean what is the typical percentage size for each subset especially for sequence labelling tasks, e.g. POS and NER?
>
> I wonder if it is something like 60% training, 20% development and 20% test?
>
> Many thanks
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
Scott Crossley, Ph.D.
Department of Applied Linguistics/ESL
Georgia State University
http://www2.gsu.edu/~wwwesl/scottcrossleybio.html
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list