[Corpora-List] Corpora for language identification training?

Dean Jones dean.m.jones at gmail.com
Thu Apr 19 09:05:44 UTC 2007


Hello all,

I'd like to train a classifier to perform language identification,
and, before I go ahead and create a corpus for this purpose, I'd like
to ask whether anyone on this list knows of anything suitable. The
main reason I'm asking is that I'm particularly interested in  finding
something which has been used in the comparative evaluation of
language identification systems. Languages that we'd initially like to
cover are English, French, Italian, German and Spanish. Thanks for any
help,

Best wishes,

Dean.



More information about the Corpora mailing list