[Corpora-List] Japanese and Korean PoS-taggers and lemmatisers

Johannes Goller gollerjo at cis.uni-muenchen.de
Wed Nov 19 13:44:40 UTC 2008


Hello Viktor,

you may also want to consider using "mecab", which is newer than Chasen
and has many happy users, too:

http://sourceforge.net/project/showfiles.php?group_id=177856

Another one, which follows slightly different tokenization and
lemmatization standards, is "Juman":

http://www-lab25.kuee.kyoto-u.ac.jp/nl-resource/juman-e.html

A very high level comparison of several tokenizers is given on this web
page in Japanese:

http://mecab.sourceforge.net/


"mecab" can be easily installed on Redhat-derived systems using

?> yum install mecab



regards,

Johannes Goller.





> For Japanese, I am a happy user of ChaSen:
> 
> http://chasen.naist.jp/hiki/ChaSen/
> 
> ... which you can install as debian package, if I remember correctly.
> 
> Best regards,
> 
> Marco
> 
> 
> v.pekar at gmail.com wrote:
> > Dear all,
> > 
> > Can anyone recommend any part-of-speech taggers and lemmatisers for Japanese and Korean? Freely available tools are preferred, but I'd be interested to know about commercial ones as well.
> > 
> > Many thanks,
> > 
> > Viktor
> > 
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list