[Lingtyp] Language learning by machines (LLMs) and by humans (L1) (results)

Stela Manova manova.stela at gmail.com
Sun Nov 24 20:59:56 UTC 2024


Dear Colleagues,

At the end of August, I asked for help with L1 English data for a comparison of language learning by LLMs and by humans. I promised to post the results of the comparison and that is why I am writing now.

I changed my initial strategy and instead of focusing on the first 100 words in L1 English and checking whether they are part of the ChatGPT’s vocabulary, I compared the first component of ChatGPT, the tokenizer, with its logical parallel, the early stages of L1. The paper “Machine learning versus human learning: Basic units and form-meaning mapping” can be accessed at: https://lingbuzz.net/lingbuzz/008548; a more detailed presentation of tokenization in “ChatGPT and linguistic theory, with a focus on morphology”, available at: https://ling.auf.net/lingbuzz/008600.  Both papers are submitted for inclusion in edited collections and are currently under review.

Finally, I would like to thank Arnold Zwicky for the help with my L1 English query and Eve Clark for the response.

Best,
Stela

*********************
 Dr. Stela MANOVA
[cid:8f89f40b-cbd8-45a1-8bba-c90f5e567169]
Email addresses:
manova at gaussaiglobal.com
manova.stela at gmail.com

Websites:
https://www.gaussaiglobal.com
https://sites.google.com/view/stelamanova

________________________________
From: Arnold Zwicky <arnold.zwicky at gmail.com>
Sent: Saturday, August 10, 2024 12:25 AM
To: Stela Manova <manova.stela at gmail.com>; Linguistic Linguistic Typology <lingtyp at listserv.linguistlist.org>
Subject: Re: [Lingtyp] L1 English



> On Aug 9, 2024, at 12:21 PM, Stela Manova via Lingtyp <lingtyp at listserv.linguistlist.org> wrote:
>
>  I am writing to ask for your help with the following issue. I have been working on the relationship between LLMs and linguistic theory and now it is time to check how language acquisition happens in humans and machines. I am therefore looking for the very first, at least, 100 words of children with L1 English (in the ideal case, data should come from different varieties of the language). ... Please feel free to forward the query to linguists that are not on the list.

I appealed to my old friend and illustrious colleague Eve Clark, and here are her musings, for you to use as you wish. Please don't engage with me on this (this is not at all my field) or with EC (who just gave a quick reaction).

,,,,,

We know a lot about the first 100+ words children produce (their early comprehension isn’t as well documented) but parental reports can be somewhat unreliable at times. All this is in the CHILDES Archive, of course, with many further analyses of the CDI data also in Michael Frank’s recent book:

Frank, M. C., Braginsky, Mika; Yurovsky, Daniel, & Marchman, Virginia A.  2021. Variability and Consistency in Early Language Learning: The Wordbank project. Cambridge, MA: MIT Press.

Another source I’d recommend is an article by Warstadt & Bowman:

Warstadt, Alex, & Bowman, Samuel R.  2022. What artificial neural networks can tell us about human language acquisition. In S. Lappin & J.-P. Bernardy (eds.), Algebraic Structures in Natural Language (pp. 17-29). London: CRC Press.
 (They compare the actual amounts of input required for different models to learn from, compared to human infants…)

My general impression is that no-one in the LLM ‘field’ knows anything about human language acquisition, certainly not the details for early perception, phonological sequences, word identification/recognition, word combination, speech acts, etc.etc. For a recent taste of this, see the fourth edition of
Clark, E. V. 2024. First Language Acquisition (Cambridge UP) for all that will need to be accounted for…
.....

Arnold


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20241124/7d727cb4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-dtwz5sbd.png
Type: image/png
Size: 27077 bytes
Desc: Outlook-dtwz5sbd.png
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20241124/7d727cb4/attachment.png>


More information about the Lingtyp mailing list