[Corpora-List] POS tagger for Japanese

Michal Ptaszynski michal.ptaszynski at gmail.com
Sun Sep 15 16:31:55 UTC 2013


Dear Amy

MeCab is a standard tool you would look for. You can either retrain it if you have appropriate corpus for that (pretty difficult due some bugs Kudo left in the code) or just add words to the user dictionary, which is a little bit troublesome on the beginning but works pretty well. In our research we've been adding words which are used in cyberbullying and it helps a lot. MeCab doesn't lose it's speed, and you can half-automatize the process of adding the words by using templates. If you want to try I can give you an assistance. 
Best,
--
Michal Ptaszynski

Dnia 15 wrz 2013 o godz. 03:26 Amy Aisha Brown <amy-aisha.brown at open.ac.uk> napisał(a):

> Dear all,
> 
> This is a long shot but I am looking for a POS tagging/morphological analysis system for Japanese that works (well) with tweets (i.e., something that has been trained with social media texts).
> 
> If anyone has any information about this, I would love to hear from you.
> 
> Thanks in advance!
> 
> Amy Brown
> 
> -- 
> Amy Aisha Brown
> Research Student
> Faculty of Education and Language Studies
> The Open University
> amy-aisha.brown at open.ac.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130916/f760a059/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list