Issues with MLU and FREQ

Mon Mar 20 16:42:45 UTC 2006

Dear Brian
In the email from Dongping Zheng <dongping.zheng at uconn.edu>  there is a comment
about using “–t%mor” in the command line when calculating MLU. I cannot
remember seeing reference to this in any other exchange on MLU or in the CLAN
manual.

There is a major difference between including “–t%mor” in the command line and
omitting it. Could you explain just why this might be so? I would have expected
that the number of utterances would be somewhat related (e.g. half the number
when “-t%mor” is included) and the same for the morphemes, but there seems to
be absolutely no relation.
I give an example from Wells’ data below:
When I used the command  “mlu +t*CHI -t%mor +f jonath*.cha”, the result for the
file: Jonath08.cha was as follows:
Number of: utterances = 50, morphemes = 241
	Ratio of morphemes over utterances = 4.820
	Standard deviation = 3.882

When I used the command “mlu +t*CHI +f jonath*.cha (leaving out “-t%mor”)”, the
result was:
Number of: utterances = 252, morphemes = 691
	Ratio of morphemes over utterances = 2.742
	Standard deviation = 2.070
The same situation arose for all the files in this corpus – calculations of
utterances and morphemes for the two commands that bore no relation to one
another.
My question is: which formula is to be taken as the appropriate one for
calculating MLU? Obviously there is a huge difference in the calculation.

A second related question: How reliable is MLU considered to be as an index of
development nowadays? Obviously this is an important issue, for example in
attempting to find at what point features like regular past tense appear.

Finally, a question about FREQ and the number of utterances examined. The output
of FREQ for any feature in the CHI tier contains a line - ### *CHI:, where ###
is some number. I presumed this was the number of utterances of the child in
that transcript. Obviously this cannot be the case.
Below, for example, is the output of FREQ in the file Jonath08.cha referred to
above for irregular pasts (command "freq +d2 +t*CHI +t%mor +s"*&PAST*" +f
jonath*.cha"). It was necessary to include the MOR tier to find "*&PAST*". What
DOES the figure 310 in this case refer to? It seems to bear no relation to the
number of utterances calculated for MLU.

@ID: en|wells|CHI|2;11.29||||Target_Child||
310 *CHI:
1 v:aux|have&past
3 v|be&past&13s
etc.
@ST:
   10    22   0.455

Best wishes
Sean Devitt

Quoting Brian MacWhinney <macw at mac.com>:

> Dear Dongping,
>      You can use GEM as the "frontend" for MLU and other programs by
> using the "piping" feature. Section
>   2.3.5 of the introductory tutorial desccribes this a bit and there
> are further descriptions in section 7 Exercises.
> Basically, GEM just does the work of narrowing down the material that
> will go into FREQ with the +d2 option to produce a STATFREQ input.
>
> --Brian MacWhinney
>
> On Feb 13, 2006, at 9:01 PM, Dongping Zheng wrote:
>
> > Hi,
> >
> > I just wanted to thank Brian and Bracha for their help. Here is the
> > trick and solution:
> >
> > Adding the -t%mor tag following the MLU command when analyzing MLU
> > in words.
> >
> >
> >
> >             My analysis is getting more involved. It is very
> > exciting to come to this point since I learned so much about
> > CHILDES and from this Listserv.  I used STATFREQ and “mlu -t%mor -
> > s"[+ bch]" +d +tBET *.cha” and was able to generate data and input
> > in Excel.
> >             I wonder if there is a way to combine GEM and STATFREQ,
> > GEM and “mlu -t%mor -s"[+ bch]" +d +tBET *.cha” so that I can
> > generate data to input in Excel rather than hand keying them in.
> > Thanks again for your help!
> > Dongping
> >
> >
> > <<<<<<<<<<<<>>>>>>>>>>>>>
> >
> > Dongping Zheng, ABD
> > Department of Educational Psychology
> > University of Connecticut
> >
> > 249 Glenbrook Rd. Unit 2064
> >
> > Storrs, CT 06269
> > dongping.zheng at uconn.edu
> > http://www.education2.uconn.edu/epsy240/dzheng/index.htm
> >
> >
> >
> > Webmaster @ Universal Design for Instruction
> > http://www.facultyware.uconn.edu/home.htm
> >
> > From: info-childes at mail.talkbank.org [mailto:info-
> > childes at mail.talkbank.org] On Behalf Of Bracha Nir-Sagiv
> > Sent: Friday, February 10, 2006 2:57 AM
> > To: Dongping Zheng
> > Cc: info-childes at mail.talkbank.org
> > Subject: Re: Gem and MLU
> >
> >
> >
> > Dear Dongping,
> > Try adding the -t%mor tag following the MLU command - in the
> > current version of CLAN, MLU works automatically on the %MOR tier
> > and you need to tell the program to disregard it.
> > Hope this helps,
> > Bracha
> >
> > Dongping Zheng wrote:
> >
> >
> > Hi,
> >
> > I was trying to run this command: gem +sgreet +t*LUL +t*SEA +t*BET
> > +t*LIZ +d *.cha | MLU and I got this message in the output file:
> >
> >
> >
> > TIER "%MOR" HASN'T BEEN FOUND IN THE INPUT DATA!
> >
> >
> >
> > I don’t have %MOR tier in the input data, I have LUL, SEA BET and
> > LIZ tiers.
> >
> >
> >
> > Would you help me to solve this problem? Oh, I use windows XP.
> >
> >
> >
> > Thank you so much!
> > Dongping
> >
> >
> >
> >
> > +++++++++++++++++++++++++++++++++++++++++++
> > This Mail Was Scanned By Mail-seCure System
> > at the Tel-Aviv University CC.
> >
> >
>
>

Dr. Seán Devitt, F.T.C.D.
Senior Lecturer in Education,
Education Department,
Trinity College, University of Dublin
Dublin, Ireland.
Phone: (353 1) 608 1293.