MLU giving odd results
Brian Macwhinney
macw at cmu.edu
Thu Oct 10 00:45:30 UTC 2024
Peter,
The shift to tagging with UD radically alters the meaning of MLU, because UD outputs all the grammatical features inherent in a stem. This is really important for crosslinguistic analysis, but it is certainly a change.Take a look at the shape of the %mor line in those files.
However, I am glad you called my attention to this, because a few of those features should not be getting into the output and I need to fix this.
If you want to stick with the 1973- 2023 version of MLU, you could either just rely on MLU in words, which is actually pretty close or else work with the older tagging of the corpora that you can get from https://childes.talkbank.org/access/Eng-NA/ Click on the link in the second line.
—Brian
> On Oct 9, 2024, at 5:46 PM, Gordon, Peter <pgordon at tc.edu> wrote:
>
> I just taught a class where students do a simple MLU analysis to get used to CHILDES. As I was doing it in class I noticed that the MLUs for Adam did not look right. His MLU for the first sample was 4.176, despite having mostly single word utterances. Any thoughts?
>
> Peter
>
> mlu +tchi childes/Eng-NA/Brown/Adam/*.cha
> Wed Oct 9 17:38:29 2024
> mlu (29-Oct-2020) is conducting analyses on:
> ONLY dependent tiers matching: %MOR;
> ****************************************
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020304.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 1239, morphemes = 5174
> Ratio of morphemes over utterances = 4.176
> Standard deviation = 2.946
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020318.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 1272, morphemes = 5062
> Ratio of morphemes over utterances = 3.980
> Standard deviation = 2.767
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020403.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 830, morphemes = 3964
> Ratio of morphemes over utterances = 4.776
> Standard deviation = 3.062
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020415.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 774, morphemes = 2870
> Ratio of morphemes over utterances = 3.708
> Standard deviation = 2.546
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020430.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 837, morphemes = 3679
> Ratio of morphemes over utterances = 4.395
> Standard deviation = 3.146
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020512.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 810, morphemes = 3392
> Ratio of morphemes over utterances = 4.188
> Standard deviation = 3.216
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020603.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 849, morphemes = 4548
> Ratio of morphemes over utterances = 5.357
> Standard deviation = 4.005
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020617.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 635, morphemes = 4197
> Ratio of morphemes over utterances = 6.609
> Standard deviation = 4.429
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020701.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 853, morphemes = 4596
> Ratio of morphemes over utterances = 5.388
> Standard deviation = 3.860
>
>
>
>
>
> _________________________________________________________________
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> From file <childes/Eng-NA/Brown/Adam/020714.cha>
> MLU for Speaker: *CHI:
> MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
> Number of: utterances = 912, morphemes = 5096
> Ratio of morphemes over utterances = 5.588
> Standard deviation = 4.284
>
>
>
>
> --
> Peter Gordon
> Pronouns: He/His/Him
> Associate Professor
> Biobehavioral Sciences and Human Development
> Teachers College, Columbia University
> 525 West 120th Street, Box 306
> New York, NY 10027
> Email pgordon at tc.edu | p: (212) 678-8162
>
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAJE3P%2B_TiyPA_-oQBAMppkUF6mP6OBpSZUjK3W_KoJDK5BcK7g%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/05155CF6-BE79-4F4A-A19A-07E06E7C3A7E%40cmu.edu.
More information about the Chibolts
mailing list