new Chinese segmenter/translator

Brian MacWhinney macw at andrew.cmu.edu
Wed Feb 21 00:51:44 UTC 2018


Dear CHIBolts,
    This letter is directed to people interested in working with Mandarin Chinese in CLAN.  John Kowalski has created a new segmenter/translator for CHAT files.  It can be downloaded from https://talkbank.org/morgrams <https://talkbank.org/morgrams> on the link to Chinese Segmenter/Translator.  This system relies on the 80,000 plus word forms in CEDict to segment CHAT files that do not have spaces between words.  At the same time, it can be used to translate between Simplified and Traditional script.  The distribution includes a readme file that I will not repeat here.  We are happy to work with people to use this system to (1) remove some forms from the CEDict list, (2) add forms to ZHO MOR, and (3) add patterns to the segmenter for any CHAT characters and other things that are not yet being properly recognized.
   Currently this only works for Mandarin, although a version for Cantonese, Thai, Japanese, or other languages that don't use spaces might be possible if there are resources for these languages similar to CEDict.

--Brian MacWhinney

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/C983F96C-EDE2-4A0E-A760-1F56333FFA80%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20180220/097938e1/attachment.htm>


More information about the Chibolts mailing list