Incorrect word segmenting for Chinese characters

Brian MacWhinney macw at cmu.edu
Tue Dec 13 14:00:02 UTC 2016


Dear Toh An,
     You have to separate Chinese words with spaces.  Here is a version that gets analyzed correctly after you put in the spaces.

-- Brian MacWhinney

From: <chibolts at googlegroups.com> on behalf of Toh An <popsune1 at gmail.com>
Reply-To: "chibolts at googlegroups.com" <chibolts at googlegroups.com>
Date: Tuesday, December 13, 2016 at 5:20 PM
To: chibolts <chibolts at googlegroups.com>
Subject: Incorrect word segmenting for Chinese characters


Hi, I have encountered a problem with Chinese data. Clan does not appear to segment Chinese sentences into word tokens correctly. Part of speech tagging is also affected. Attached is the clan output after running mlu and freq commands on a test file without mor tier (TestFileOutput), and the same test file with mor tier added (TestFileMor). Does anyone have any ideas how to resolve this? Thanks.
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com<mailto:chibolts+unsubscribe at googlegroups.com>.
To post to this group, send email to chibolts at googlegroups.com<mailto:chibolts at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/f1080fe4-4646-4e24-bd2f-7b6b753d7c54%40googlegroups.com<https://groups.google.com/d/msgid/chibolts/f1080fe4-4646-4e24-bd2f-7b6b753d7c54%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4FFD79F6-FFDD-4187-841A-0109FCBAB9CD%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20161213/edfef77b/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TestFile.cha
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20161213/edfef77b/attachment.ksh>


More information about the Chibolts mailing list