<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
En/na Dean Jones ha escrit:
<blockquote
cite="mid203872670704190205p518aa14et6f7ea1dcc4fe8ced@mail.gmail.com"
type="cite">I'd like to train a classifier to perform language
identification,
<br>
and, before I go ahead and create a corpus for this purpose, I'd like
<br>
to ask whether anyone on this list knows of anything suitable. The
<br>
main reason I'm asking is that I'm particularly interested in finding
<br>
something which has been used in the comparative evaluation of
<br>
language identification systems. Languages that we'd initially like to
<br>
cover are English, French, Italian, German and Spanish. Thanks for any
<br>
help,
<br>
</blockquote>
You can try our MM-based identifier. It's GPL, easy to train
for new languages, and it already includes models <br>
for most of the languages you mention<br>
<br>
Visit <a class="moz-txt-link-freetext" href="http://www.lsi.upc.edu/~nlp">http://www.lsi.upc.edu/~nlp</a> under "resources" menu<br>
<br>
Best<br>
<div class="moz-signature">-- <br>
<table>
<tbody>
<tr>
<td colspan="2" align="center">
<hr width="100%"></td>
</tr>
<tr>
<td valign="top"><font color="#0000aa"><b>Lluís Padró</b></font><br>
<font color="#2f2f66">Despatx Ω-S112<br>
Campus Nord UPC<br>
C/ Jordi Girona 1-3<br>
08034 Barcelona, Spain</font></td>
<td valign="top"><font color="#0000aa">Tel: <tt><font size="+1">+34
934 134 015</font></tt><br>
Fax: <tt><font size="+1">+34 934 137 833</font></tt></font><br>
<tt><font size="+1"><a href="mailto:padro@lsi.upc.es">padro@lsi.upc.edu</a><br>
<a href="http://www.lsi.upc.es/%7Epadro" target="_top">www.lsi.upc.edu/~padro</a></font></tt></td>
</tr>
<tr>
<td colspan="2" align="center">
<hr width="100%"><font color="#2f2f66">UNIVERSITAT POLITÈCNICA DE
CATALUNYA<br>
Dept. <a href="http://www.lsi.upc.es" target="_top">Llenguatges i
Sistemes Informàtics</a><br>
<a href="http://www.talp.upc.es" target="_top">TALP</a> Research
Center</font>
<hr width="100%"></td>
</tr>
</tbody>
</table>
</div>
</body>
</html>