Incorrect word segmenting for Chinese characters

Toh An popsune1 at gmail.com
Tue Dec 13 19:11:25 UTC 2016


Thank you Brian!

On Tuesday, December 13, 2016 at 5:20:10 PM UTC+8, Toh An wrote:
>
> Hi, I have encountered a problem with Chinese data. Clan does not appear 
> to segment Chinese sentences into word tokens correctly. Part of speech 
> tagging is also affected. Attached is the clan output after running mlu and 
> freq commands on a test file without mor tier (TestFileOutput), and the 
> same test file with mor tier added (TestFileMor). Does anyone have any 
> ideas how to resolve this? Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/1292840f-78d5-4962-850b-e1f158d6ca95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20161213/1dc6898b/attachment.htm>


More information about the Chibolts mailing list