><br>> A Japanese user of WordSmith needs help with the Chasen
software, which I understand provides segmentation of the string of
characters in Japanese. Desired output form would be UTF16 for
WordSmith.<br>><br>> Can anyone advise, please? Is this possible?<br>><br>> Mike<br><br><br>Hi Mike,<br><br>I think Chasen only outputs to ANSI (SHIFT-JIS here in Japan) or UTF-8. However, an alternative tool is MeCab, which does offer tentative UTF-16 support.<br>
<br>You can read about it here (unfortunately everything is in Japanese):<br><a href="http://mecab.sourceforge.net">http://mecab.sourceforge.net</a><br><br>Here's a summary of the latest version (dated 2009):<br>2009-09-27 MeCab 0.98<br>
UTF16$B$N%5%]!<%H(B($B<B83E*(B)<br>Windows$BHG$G$NJ8;z%3!<%IJQ49$K(B MutlByteToWideChar$BEy$N(B Native API$B$r;H$&$h$&$KJQ99(B<br>$B%=!<%9%3!<%I$r(B Google coding style $B$KJQ99(B<br>$B%U%)!<%^%C%H;XDj$G(B EON (end of N-best) $B$NDI2C(B (-S or --eon-format)<br>Shift-JIS$B4D6-$GH>3Q%+%?%+%J$N07$$$KLdBj$,$"$C$?$N$r=$@5(B<br>online learning $B$N%5%]!<%H(B ($B<B83E*(B)<br>
Wno-deprecated$B$r$D$1$J$/$F$b%3%s%Q%$%k$G$-$k$h$&$K$7$?(B<br>$B:Y$+$$%P%0$N=$@5(B<br><br>Hope that helps!<br>Laurence.<br>