new Chinese lexicon

Brian MacWhinney macw at mac.com
Fri Aug 20 21:43:31 UTC 2004


Dear Info-CHILDES,
    I have now completed an initial version of a MOR system for Mandarin
Chinese (Puotonghua).  This system is
based on the lexicon from the CKIP research group at the Academica
Sinica, but reformatted to work with MOR.
It is available now with the other MOR grammars on the web.  For this
grammar, we have both Simplified and
Traditional forms -- all in Unicode.  To test it out, I have run the
grammar on Zhou Jing's corpus and it recognizes the majority of the
words.  However, Zhou Jing and I will need to work together to get it
to cover all the forms.  In most
cases the failures to recognize are a result of missing spaces or
missing special form markers.
    I will also test this out soon on some data from Chien-Ju Chang.  If
anyone else wants to try out the lexicon on
their Chinese files, I can help or maybe you can do it alone.  Right
now the system is configured to focus primarily
on analysis of Hanzi characters as input, rather than pinyin, although
it may be possible eventually to use pinyin too.

--Brian MacWhinney



More information about the Info-childes mailing list