[Corpora-List] Corpora for language identification training?

Adam Funk a.funk at dcs.shef.ac.uk
Thu Apr 19 13:15:17 UTC 2007


[19/04/07 13:35] Dean Jones wrote:

> Sorry, I wasn't clear. Personally I'm interested in language ID for
> "written" texts - specifically, email, although others on the list may
> be interested in spoken language ID, so I wouldn't want to discourage
> responses about that.

Here's a tool you might be interested in:

http://www.let.rug.nl/~vannoord/TextCat/


along with a list of others:

http://www.let.rug.nl/~vannoord/TextCat/competitors.html



More information about the Corpora mailing list