[Corpora-List] Japanese corpora

Marco Baroni baroni at sslmit.unibo.it
Wed Jul 20 11:17:34 UTC 2005


This is definitely not the ideal solution, but what we do is we download
Japanese "corpora" from the web (if you are interested, I can send you url
lists corresponding to documents in our corpus and tools to download the
corresponding docs), and we tokenize them/pos-tag them using ChaSen:

http://chasen.aist-nara.ac.jp/hiki/ChaSen/

Once you have a corpus tagged with ChaSen, you could use it to create 
other resources (e.g., simple dictionaries of word/morphological features 
pairs).

Regards,

Marco



More information about the Corpora mailing list