[Corpora] [Corpora-List] Language identification tool
tb at ldwin.net
tb at ldwin.net
Fri Nov 14 12:33:01 UTC 2014
Allow me to add a gratuitous plug for langid.py for monolingual langid:
https://github.com/saffsd/langid.py
and polyglot if you expect to have some multilingual documents:
https://github.com/saffsd/polyglot
Both written in Python, and langid.py also has Java and C implementations.
Tim
At Fri, 14 Nov 2014 14:10:12 +0200,
Ivelina Nikolova wrote:
>
> On 11/14/2014 01:56 PM, Matthew Purver wrote:
> > you can run Google's Compact Language Detector as a Python application:
> >
> > https://code.google.com/p/chromium-compact-language-detector/
> Thanks Matthew!
>
> Sérgio Matos suggested also this Java application:
>
> https://code.google.com/p/language-detection/
>
> Best,
> Ivelina
>
> >
> >
> > On 14/11/2014 10:46, Ivelina Nikolova wrote:
> >> Dear corpora members,
> >>
> >> is any of you aware of a language identification tool that covers at
> >> least the EU official languages.
> >> Preferably a stand alone application.
> >>
> >> Thanks in advance,
> >> Ivelina
> >>
> >
>
>
> --
> Ivelina Nikolova
> PhD student in Computer Science
> Linguistic Modelling Department
> Institute of Information and Communication Technologies
> Bulgarian Academy of Sciences
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Tim Baldwin
Professor
ARC Future Fellow
Department of Computing and Information Systems
The University of Melbourne
Victoria 3010, Australia
Tel: (+61)-3-8344-1363
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list