[Corpora-List] POS tagger for Japanese

Christian Moen cm at atilika.com
Sun Sep 15 16:02:20 UTC 2013


Hello Amy,

If you're using C/C++ I recommend having a look at MeCab (https://code.google.com/p/mecab/).  If you're using Java, you might find Kuromoji (http://atilika.org/) useful, which has almost identical segmentation as MeCab (middle dot is treated somewhat differently).

There's to my knowledge no freely available statistical model trained exclusively on tweets (similar to CMU ARK's TweetNLP) for Japanese.  However, UniDic is based on a balanced corpus and I'd start experimenting with that model/dictionary.  Please feel free to get in touch directly if you need help getting started with Kuromoji.

Best regards,

Christian Moen
アティリカ株式会社
http://www.atilika.com

On Sep 15, 2013, at 3:26 AM, Amy Aisha Brown <amy-aisha.brown at open.ac.uk> wrote:

> Dear all,
> 
> This is a long shot but I am looking for a POS tagging/morphological analysis system for Japanese that works (well) with tweets (i.e., something that has been trained with social media texts).
> 
> If anyone has any information about this, I would love to hear from you.
> 
> Thanks in advance!
> 
> Amy Brown
> 
> -- 
> Amy Aisha Brown
> Research Student
> Faculty of Education and Language Studies
> The Open University
> amy-aisha.brown at open.ac.uk
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130916/8a8c9485/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list