updated English MOR, POST, and %mor

Brian MacWhinney macw at mac.com
Mon Nov 29 01:55:59 UTC 2004


Dear Info-CHILDES,
     Just a quick note to inform people that I recently corrected the
training set for the POSTTRAIN program.  I used a new program called
TRNFIX that spots all discrepancies between POST and MOR and allows the
user to decide which one was right.  I did this for the Brown Eve
corpus which is my current training corpus.  Using just the adult
utterances in that corpus, I recreated the eng.db file used by POST.
Kenji Sagae and I have found that training on maternal input gives
better accuracy than training on maternal input plus child sentences or
just child sentences.
    Using the new eng.db, I then went through all of the Eng-USA and
Eng-UK corpora (with 6 exceptions) and produced a new disambiguated
%mor line that should be now more accurate than the previous one.  The
major errors in the previous version were in the area of confusions of
determiners with pronouns and verbs with auxiliaries.
    The corpora which are still "in the shop" vis a vis MOR are: Clark,
Hall, Forrester, Howe, Manchester, and Wells.   After these six are
finished, we will be moving on to work with the English corpora in the
narrative and clinical directories.

--Brian MacWhinney



More information about the Info-childes mailing list