[Corpora] [Corpora-List] Language identification tool

kim gerdes kim at gerdes.fr
Sat Nov 15 17:08:19 UTC 2014


Hello,

On Fri, Nov 14, 2014 at 6:21 PM, Valerio Basile <v.basile at rug.nl> wrote:

> > is any of you aware of a language identification tool that covers at
> least the EU official languages.
> > Preferably a stand alone application.
>
> I'd like to throw in TextCat:
>
>   http://odur.let.rug.nl/~vannoord/TextCat/
>
> It's a Perl script, and it supports 76 languages, the complete list is on
> the website.
>
>
You can try out Textcat and the Compact Language Detector 2 online on
http://elizia.net/languageDetector/
and compare it to a simple Python script that I have written, the
languageDetector, based on unicode character identification and trigrams.

best
kim
--
gerdes.fr



> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141115/65224b20/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list