[Corpora-List] Japanese and Korean PoS-taggers and lemmatisers
Johannes Goller
gollerjo at cis.uni-muenchen.de
Wed Nov 19 13:44:40 UTC 2008
Hello Viktor,
you may also want to consider using "mecab", which is newer than Chasen
and has many happy users, too:
http://sourceforge.net/project/showfiles.php?group_id=177856
Another one, which follows slightly different tokenization and
lemmatization standards, is "Juman":
http://www-lab25.kuee.kyoto-u.ac.jp/nl-resource/juman-e.html
A very high level comparison of several tokenizers is given on this web
page in Japanese:
http://mecab.sourceforge.net/
"mecab" can be easily installed on Redhat-derived systems using
?> yum install mecab
regards,
Johannes Goller.
> For Japanese, I am a happy user of ChaSen:
>
> http://chasen.naist.jp/hiki/ChaSen/
>
> ... which you can install as debian package, if I remember correctly.
>
> Best regards,
>
> Marco
>
>
> v.pekar at gmail.com wrote:
> > Dear all,
> >
> > Can anyone recommend any part-of-speech taggers and lemmatisers for Japanese and Korean? Freely available tools are preferred, but I'd be interested to know about commercial ones as well.
> >
> > Many thanks,
> >
> > Viktor
> >
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list