Incorrect word segmenting for Chinese characters
Toh An
popsune1 at gmail.com
Tue Dec 13 19:11:25 UTC 2016
Thank you Brian!
On Tuesday, December 13, 2016 at 5:20:10 PM UTC+8, Toh An wrote:
>
> Hi, I have encountered a problem with Chinese data. Clan does not appear
> to segment Chinese sentences into word tokens correctly. Part of speech
> tagging is also affected. Attached is the clan output after running mlu and
> freq commands on a test file without mor tier (TestFileOutput), and the
> same test file with mor tier added (TestFileMor). Does anyone have any
> ideas how to resolve this? Thanks.
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/1292840f-78d5-4962-850b-e1f158d6ca95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20161213/1dc6898b/attachment.htm>
More information about the Chibolts
mailing list