[Corpora-List] Chasen and Japanese

Laurence Anthony anthony0122 at gmail.com
Tue Jun 28 17:14:49 UTC 2011


2011/6/29 Mike Scott <mike at lexically.net>

> **
> Depends who's doing the plumbing, really. In WordSmith there is a converter
> which will do that already, but Chasen is a different piece of kit and I
> think you need a Japanese plumber for that.!
> Mike
>
>
It appears that there might be a way to get Chasen to output UTF-16. But it
looks a little complicated and involves creating UTF-16 versions of the
grammar and dictionary files. The only encodings supported in the default
setup are EUC-JP (for Linux) and Shift_JIS (for Windows), with some options
to get ISO-8859-1 and UTF-8 encodings.

The information is here (again in Japanese):
http://chasen-legacy.sourceforge.jp/

Laurence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110629/0bd3167b/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list