Incorrect word segmenting for Chinese characters

Toh An popsune1 at gmail.com
Tue Dec 13 18:28:02 UTC 2016


Dear Brian,

Thank you for the prompt reply! Is there an automated way of adding spaces, 
or do we have to add in spaces manually?

Toh

On Tuesday, December 13, 2016 at 5:20:10 PM UTC+8, Toh An wrote:
>
> Hi, I have encountered a problem with Chinese data. Clan does not appear 
> to segment Chinese sentences into word tokens correctly. Part of speech 
> tagging is also affected. Attached is the clan output after running mlu and 
> freq commands on a test file without mor tier (TestFileOutput), and the 
> same test file with mor tier added (TestFileMor). Does anyone have any 
> ideas how to resolve this? Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/647f189b-a76a-4e53-83cb-fe30d3acdb59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20161213/631c6c6d/attachment.htm>


More information about the Chibolts mailing list