revised POST database

Brian MacWhinney macw at cmu.edu
Mon Apr 17 01:33:44 UTC 2006


Dear Info-Chibolts,
     I spent most of this week trying to tie up some loose ends in  
the MOR analysis for English files.  My first target was improving  
the ability of the POST database to properly disambiguate between the  
use of words like "have" and "is" and their corresponding contracted  
forms as either auxiliaries or main verbs.  Because of some errors I  
had introduced to the Eve training corpus, these forms were being  
overwhelming judged to be main verbs.  After correcting the errors in  
the training corpus, the bulk of these errors are now gone.  I also  
did some repairs to the training corpus for the disambiguation of the  
word "to" as infinitive or preposition and for "like" as a verb,  
preposition, or subordinating conjunction.  And there were a variety  
of additional minor fixes.
    I then applied these fixes to the Manchester and Brown corpora  
with good results.  So far those are the only two corpora that have  
been run through the new MOR and POST, but over the next weeks we  
will redo the rest of the database.
If you are using any datasets actively, you will want to get these  
better versions or you can even run the new MOR and POST on the  
datasets for yourself.
    It is very helpful to receive feedback regarding systematic  
errors in MOR and POST.  Errors occurring in incomplete two word  
sentences are not too useful, but for longer sentences from either  
the child or the adults, systematic reports of errors types can help  
guide me in future repairs to the training corpus or the expansion of  
the training corpus.

--Brian MacWhinney



More information about the Chibolts mailing list