MLU and %mor

Mon Mar 20 18:22:08 UTC 2006

Dear Info-CHILDES,
    I try to avoid posting details regarding CLAN programs to this  
list, but I have received enough emails on this issue to suggest that  
it is time to remind people once more of a change made about 24  
months ago in the operation of MLU.  This is that MLU now operates by  
default on the %mor line, not the main line.   In the database, there  
is now a %mor now for all of the English files and much of French,  
German, Italian, Chinese, Japanese, and Spanish.
    If your corpus has a %mor line, then MLU will give you a true  
MLU.  If it does not, and if you then run MLU on the main line by  
adding the -t%mor switch, what you are going to get is not "MLU in  
morphemes" but "MLU in words".  Since the main line now includes no  
hyphenated words, MLU is going to count each word as one word and  
will do no morphemic analysis.  If you want to have an MLU in  
morphemes, you have to have a %mor line.
     This shift to reliance on the %mor line was a part of a general  
plan of increased support for computational linguistic tools in the  
most recent proposal  to NIH for continued funding for CHILDES.  The  
basic rationale is that coherent and relplicable automatic  
morphosyntactic analysis has to be conducted on the basis of a  
systematically tagged database.  Constructing a complete %mor line  
for all these files has been a huge job, as you can imagine.   
However, I am convinced that reliable progress in child language  
morphosyntactic analysis will proceed best through reliance on  
consistent computational tools.

--Brian MacWhinney, CMU

P.S.  For those we haven't  got the time to construct %mor lines for  
new data, it may be comforting to know that MLU in words is highly  
correlated with MLU in morphemes, at least for English.