[Corpora] [Corpora-List] Language identification tool

tb at ldwin.net tb at ldwin.net
Fri Nov 14 12:33:01 UTC 2014


Allow me to add a gratuitous plug for langid.py for monolingual langid:

https://github.com/saffsd/langid.py

and polyglot if you expect to have some multilingual documents:

https://github.com/saffsd/polyglot

Both written in Python, and langid.py also has Java and C implementations.


Tim

At Fri, 14 Nov 2014 14:10:12 +0200,
Ivelina Nikolova wrote:
> 
> On 11/14/2014 01:56 PM, Matthew Purver wrote:
> > you can run Google's Compact Language Detector as a Python application:
> >
> > https://code.google.com/p/chromium-compact-language-detector/
> Thanks Matthew!
> 
> Sérgio Matos suggested also this Java application:
> 
> https://code.google.com/p/language-detection/
> 
> Best,
> Ivelina
> 
> >
> >
> > On 14/11/2014 10:46, Ivelina Nikolova wrote:
> >> Dear corpora members,
> >>
> >> is any of you aware of a language identification tool that covers at
> >> least the EU official languages.
> >> Preferably a stand alone application.
> >>
> >> Thanks in advance,
> >> Ivelina
> >>
> >
> 
> 
> -- 
> Ivelina Nikolova
> PhD student in Computer Science
> Linguistic Modelling Department
> Institute of Information and Communication Technologies
> Bulgarian Academy of Sciences
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

--

Tim Baldwin
Professor
ARC Future Fellow
Department of Computing and Information Systems
The University of Melbourne
Victoria 3010, Australia

Tel: (+61)-3-8344-1363

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list