[Corpora-List] Corpora for language identification training?

Lluis Padro padro at lsi.upc.edu
Thu Apr 19 09:56:13 UTC 2007


En/na Dean Jones ha escrit:
> I'd like to train a classifier to perform language identification,
> and, before I go ahead and create a corpus for this purpose, I'd like
> to ask whether anyone on this list knows of anything suitable. The
> main reason I'm asking is that I'm particularly interested in  finding
> something which has been used in the comparative evaluation of
> language identification systems. Languages that we'd initially like to
> cover are English, French, Italian, German and Spanish. Thanks for any
> help,
      You can try our MM-based identifier.   It's GPL, easy to train for 
new languages, and it already includes models
   for most of the languages you mention

      Visit http://www.lsi.upc.edu/~nlp  under "resources" menu

          Best
-- 
------------------------------------------------------------------------
*Lluís Padró*
Despatx ?-S112
Campus Nord UPC
C/ Jordi Girona 1-3
08034 Barcelona, Spain 	Tel: +34 934 134 015
Fax: +34 934 137 833
padro at lsi.upc.edu <mailto:padro at lsi.upc.es>
www.lsi.upc.edu/~padro <http://www.lsi.upc.es/%7Epadro>
------------------------------------------------------------------------
UNIVERSITAT POLITÈCNICA DE CATALUNYA
Dept. Llenguatges i Sistemes Informàtics <http://www.lsi.upc.es>
TALP <http://www.talp.upc.es> Research Center
------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070419/0ecd70a9/attachment.htm>


More information about the Corpora mailing list