MLU in characters?
Leonid Spektor
spektor at andrew.cmu.edu
Tue Jul 29 22:33:29 UTC 2025
I have changed MLU to count characters. The new options are -bw for counting words and -bc for counting characters. Without -b option MLU will count morphemes.
New CLAN is on the web.
Leonid.
> On Jul 29, 2025, at 14:28, Janet Bang <janet.bang at sjsu.edu> wrote:
>
> Hi everyone,
>
> Thank you for your ideas. The thought also crossed our mind to do the manual version of inserting a space! We'll also look into WDLEN.
>
> @Leonid Spektor <mailto:spektor at andrew.cmu.edu>, yes I think that would work for our exploratory use case (comparing types, tokens, and MLU for English and Spanish using morphemes, words, and characters). We are still in early stages.
>
> Would the +b option consider the same words and utterances that would be counted with MLUw? Or would this disregard the MLU rules that are built in?
>
> Janet
>
> On Tue, Jul 29, 2025 at 11:01 AM Leonid Spektor <spektor at andrew.cmu.edu <mailto:spektor at andrew.cmu.edu>> wrote:
>> HI,
>>
>> It is easy to add an option to MLU to count characters over utterances. Currently MLU counts words or morphemes over utterances.
>>
>> Just to confirm I understand what you want. I will change +b option to count characters or words. In the case of counting characters each word will be used to count how many characters are in that word and the sum of all characters will be used to count MLU over utterances. Is this what you want?
>>
>> If it is, then I will put new version of CLAN on the web by the end of today.
>>
>>
>> Leonid.
>>
>>> On Jul 29, 2025, at 13:24, Nan Bernstein Ratner <nratner at umd.edu <mailto:nratner at umd.edu>> wrote:
>>>
>>> Couldn't WDLEN do something in this regard? It counts characters...
>>>
>>> Nan Bernstein Ratner, F-, H-ASHA, F-AAAS, Board Certified Specialist in Stuttering, Cluttering, and Fluency Disorders
>>> she/her/hers
>>> Distinguished University Professor
>>> Hearing and Speech Sciences
>>> University of Maryland
>>> 0100 Lefrak Hall, 7251 Preinkert Drive
>>> College Park, MD 20742
>>> nratner at umd.edu <mailto:nratner at umd.edu>, 301-405-4217 My Zoom <https://umd.zoom.us/j/7924324343>
>>> Co-director: FluencyBank (www.fluency.talkbank.org <http://www.fluency.talkbank.org/>); http://languagefluency.umd.edu/
>>>
>>> Faculty, Language Science (languagescience.umd.edu <http://languagescience.umd.edu/>; Neuroscience & Cognitive Neuroscience (NACS, nacs.umd.edu <http://nacs.umd.edu/>), Developmental Science Field Committee
>>>
>>> https://hesp.umd.edu/facultyprofile/bernstein-ratner/nan
>>>
>>>
>>> My PubMed bibliography: https://www.ncbi.nlm.nih.gov/myncbi/1RORcBHUvuRQ82/bibliography/public/
>>>
>>>
>>> On Tue, Jul 29, 2025 at 11:27 AM Shanley <allen at rhrk.uni-kl.de <mailto:allen at rhrk.uni-kl.de>> wrote:
>>>> The poor person’s workaround - you could tweak the system by just making each character into a word - i.e. by putting a space between every character on whatever tier you’re using to count MLU. Surely a python script could easily do this for you.
>>>>
>>>> Or a more complicated variant would be to write a python script to calculate what you want from the existing file.
>>>>
>>>> In both cases, you should of course take Leonid’s observation below into account - that you’d need to first decide which words/utterances should be included.
>>>>
>>>> Best,
>>>> Shanley Allen.
>>>>
>>>>
>>>>
>>>>> On 25. Jul 2025, at 15:15, 'Janet Bang' via chibolts <chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>> wrote:
>>>>>
>>>>> Got it, thank you!
>>>>>
>>>>> On Fri, Jul 25, 2025 at 12:12 PM Leonid Spektor <spektor at andrew.cmu.edu <mailto:spektor at andrew.cmu.edu>> wrote:
>>>>>> Janet,
>>>>>>
>>>>>> I am sorry to say it, but MLU can only count words or morphemes.
>>>>>>
>>>>>> If you plan to use another program, then please keep in mind that MLU uses a lot of rules to decide if utterance or word(s) should be counted. You can read those rule in CLAN manual at https://talkbank.org/0info/manuals/CLAN.pdf. Please look for chapter "7.19" MLU in the manual.
>>>>>>
>>>>>>
>>>>>> Leonid.
>>>>>>
>>>>>>> On Jul 25, 2025, at 14:53, 'Janet Bang' via chibolts <chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>> wrote:
>>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Is there a way to use the MLU program to extract MLU in characters? We are exploring measures to facilitate cross-linguistic comparisons between English and Spanish and someone had recommended using characters (over MLU words) given the orthographic transparency of Spanish.
>>>>>>>
>>>>>>> We saw some other programs on github, but I was hoping there was something within CLAN because we had already used MOR within CLAN.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Janet
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>>>>>>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/9b2ae135-2fdb-4b55-b9f2-06886ace8217n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/9b2ae135-2fdb-4b55-b9f2-06886ace8217n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>>>>>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/E954F36D-0B93-4B5C-8C05-7C37BA062E75%40andrew.cmu.edu <https://groups.google.com/d/msgid/chibolts/E954F36D-0B93-4B5C-8C05-7C37BA062E75%40andrew.cmu.edu?utm_medium=email&utm_source=footer>.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Janet Y. Bang, Ph.D (she/her/hers)
>>>>> Assistant Professor
>>>>> Child and Adolescent Development
>>>>> Lurie College of Education, San José State University
>>>>> janet.bang at sjsu.edu <mailto:janet.bang at sjsu.edu> | 408-924-3714
>>>>> https://www.sjsu.edu/education/faculty/janet-bang.php
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>>>>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/CAL7GuZrwdWQbgt5381CSSTCUyPA6drFzpycbzZDopi8VOojQKA%40mail.gmail.com <https://groups.google.com/d/msgid/chibolts/CAL7GuZrwdWQbgt5381CSSTCUyPA6drFzpycbzZDopi8VOojQKA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>>
>>>> ********************************************************************************
>>>> Prof. Dr. Shanley E. M. Allen
>>>> Director, Psycholinguistics and Language Development Group
>>>> Center for Cognitive Science
>>>> University of Kaiserslautern-Landau
>>>> Erwin-Schrödinger-Straße 57/409
>>>> 67663 Kaiserslautern
>>>> Germany
>>>>
>>>> e-mail: allen at rptu.de <mailto:allen at rptu.de>
>>>> phone: +49-631-205-4136
>>>> fax: +49-631-205-5182
>>>> office: Building 57, Office 409
>>>> web: http://www.sowi.uni-kl.de/psycholinguistics/home/
>>>> ********************************************************************************
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>>>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/FA203199-03A0-4407-A433-2F95ED5E5FAC%40rhrk.uni-kl.de <https://groups.google.com/d/msgid/chibolts/FA203199-03A0-4407-A433-2F95ED5E5FAC%40rhrk.uni-kl.de?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/CAAFocx4Y04XU467TLai5U2Rhjuq2WpOkcZEQUtUXe%2BAJuRu94Q%40mail.gmail.com <https://groups.google.com/d/msgid/chibolts/CAAFocx4Y04XU467TLai5U2Rhjuq2WpOkcZEQUtUXe%2BAJuRu94Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>> To view this discussion visit https://groups.google.com/d/msgid/chibolts/6DFC226B-18F8-4E34-A667-60D7EF79310C%40andrew.cmu.edu <https://groups.google.com/d/msgid/chibolts/6DFC226B-18F8-4E34-A667-60D7EF79310C%40andrew.cmu.edu?utm_medium=email&utm_source=footer>.
>
>
>
> --
> Janet Y. Bang, Ph.D (she/her/hers)
> Assistant Professor
> Child and Adolescent Development
> Lurie College of Education, San José State University
> janet.bang at sjsu.edu <mailto:janet.bang at sjsu.edu> | 408-924-3714
> https://www.sjsu.edu/education/faculty/janet-bang.php
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chibolts/3B4EA615-95F9-4254-928F-D592C8E7399D%40andrew.cmu.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20250729/e1aafa1f/attachment-0001.htm>
More information about the Chibolts
mailing list