MLU and %mor
Brian MacWhinney
macw at cmu.edu
Mon Mar 20 18:22:08 UTC 2006
Dear Info-CHILDES,
I try to avoid posting details regarding CLAN programs to this
list, but I have received enough emails on this issue to suggest that
it is time to remind people once more of a change made about 24
months ago in the operation of MLU. This is that MLU now operates by
default on the %mor line, not the main line. In the database, there
is now a %mor now for all of the English files and much of French,
German, Italian, Chinese, Japanese, and Spanish.
If your corpus has a %mor line, then MLU will give you a true
MLU. If it does not, and if you then run MLU on the main line by
adding the -t%mor switch, what you are going to get is not "MLU in
morphemes" but "MLU in words". Since the main line now includes no
hyphenated words, MLU is going to count each word as one word and
will do no morphemic analysis. If you want to have an MLU in
morphemes, you have to have a %mor line.
This shift to reliance on the %mor line was a part of a general
plan of increased support for computational linguistic tools in the
most recent proposal to NIH for continued funding for CHILDES. The
basic rationale is that coherent and relplicable automatic
morphosyntactic analysis has to be conducted on the basis of a
systematically tagged database. Constructing a complete %mor line
for all these files has been a huge job, as you can imagine.
However, I am convinced that reliable progress in child language
morphosyntactic analysis will proceed best through reliance on
consistent computational tools.
--Brian MacWhinney, CMU
P.S. For those we haven't got the time to construct %mor lines for
new data, it may be comforting to know that MLU in words is highly
correlated with MLU in morphemes, at least for English.
More information about the Info-childes
mailing list