different MLUs for different CLAN versions

Tue Sep 28 19:01:08 UTC 2010

Dear Jamie,
     What matters in this case is not the version of CLAN, but the version of the Adam corpus that was being used.  If Bobbi was using a newer version of the corpus and her students an older version, then my account could be right.  I just now checked the number of &um, &ah, &aw, &mm and &um in Adam and the numbers add up just about right.
   But it could be that something else is going on.  One way to check this is to try the same single file with the different versions and add file by file until they diverge.  Leonid is on vacation this week, but he may have some comments on this later on.

-- Brian

On Sep 28, 2010, at 2:41 PM, Jamie Smith wrote:

> Wouldn't that result in fewer morphemes in the newer MLU calculation?
> It looks like there are more morphemes counted with the newer version
> of CLAN (where fillers were excluded), fewer with the older version
> (where fillers were included).
> 
> Jamie
> 
> On Sep 28, 1:32 pm, Brian MacWhinney <m... at cmu.edu> wrote:
>> Dear Bobbi,
>> 
>>     Thanks for the clear report on this.  The July version of CLAN was using the %mor line, so that cannot be the difference. I think this change is due to the treatment of fillers in MOR.  Before July, forms like "um" and "uh" were getting recognized as lexical items on the %mor line, but not the main line.  This was essentially a "bug" in the way MLU was working that resulted from the transition from computation from the main line to computation from the %mor line.   In order to fix this, I changed all of the "um" forms to "&um".   This then makes it so that they don't end up on the %mor line, which is the correct treatment.
>> 
>> -- Brian MacWhinney
>> On Sep 28, 2010, at 12:45 PM, RCorrigan wrote:
>> 
>>> For homework, I had-my Language Acquisition class compute a simple MLU
>>> on Adam's speech in adam01. I noticed that they were getting a
>>> different answer than I did. It turns out that I was using a July
>>> version of CLAN and they were using a September version.
>>> My output looked like this:
>> 
>>> From file <adam01.cha>
>>> mlu +t*CHI adam01.cha
>>> Mon Sep 27 07:14:31 2010
>>> mlu (02-Jul-2010) is conducting analyses on:
>>>  ONLY speaker main tiers matching: *CHI;
>>> ****************************************
>>> From file <adam01.cha>
>>> MLU for Speaker: *CHI
>>>  MLU (xxx and yyy are EXCLUDED from the utterance and morpheme
>>> counts):
>>>    Number of: utterances = 1232, morphemes = 2582
>>>    Ratio of morphemes over utterances = 2.096
>>>    Standard deviation = 1.024
>> 
>>> Their output looked like this:
>>> mlu +t*chi adam01.cha
>>> Tue Sep 28 11:33:56 2010
>>> mlu (08-Sep-2010) is conducting analyses on:
>>>  ONLY dependent tiers matching: %MOR;
>>> ****************************************
>>> From file <adam01.cha>
>>> MLU for Speaker: *CHI
>>>  MLU (xxx and yyy are EXCLUDED from the utterance and morpheme
>>> counts):
>>>    Number of: utterances = 1232, morphemes = 2644
>>>    Ratio of morphemes over utterances = 2.146
>>>    Standard deviation = 1.067
>> 
>>> I know it's not a big difference, but why should MLUs change from one
>>> slight revision of CLAN to the next? The best I can figure, the July
>>> version must not have been calculating MLU on the %MOR line, but all
>>> the documentation for years has claimed it was.
>> 
>>> In addition, the browsable database is using still a different July
>>> version and is giving a slightly different answers than the other two
>>> (both the utterance and morpheme counts are different on this one)
>> 
>>> mlu +t*chi adam01.cha
>> 
>>> Tue Sep 28 12:36:51 2010 mlu (10-Jun-2009) is conducting analyses on:
>>> ONLY speaker main tiers matching: *CHI;
>>> ****************************************
>>> From file "adam01.cha"
>>> MLU for Speaker: *CHI
>>> MLU (xxx and yyy are EXCLUDED from the utterance and morpheme counts):
>>> Number of: utterances = 1236, morphemes = 2582
>>> Ratio of morphemes over utterances = 2.089
>>> Standard deviation = 1.030
>> 
>>> Thanks for letting me know what is going on.
>> 
>>> Bobbi Corrigan
>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>> To post to this group, send email to chibolts at googlegroups.com.
>>> To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
>>> For more options, visit this group athttp://groups.google.com/group/chibolts?hl=en.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To post to this group, send email to chibolts at googlegroups.com.
> To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
> 
> 

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.