<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Dear Colleagues, </div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
At the end of August, I asked for help with L1 English data for a comparison of language learning by LLMs and by humans. I promised to post the results of the comparison and that is why I am writing now.</div>
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I changed my initial strategy and instead of focusing on the first 100 words in L1 English and checking whether they are part of the ChatGPT’s vocabulary, I compared the first component of ChatGPT, the tokenizer, with its logical parallel, the early stages
of L1. The paper “Machine learning versus human learning: Basic units and form-meaning mapping” can be accessed at:
<span style="color: rgb(17, 85, 204);"><u><a href="https://lingbuzz.net/lingbuzz/008548" id="OWA29656c7c-0434-312d-d971-15e4b8693471" class="OWAAutoLink" data-auth="NotApplicable" style="color: rgb(17, 85, 204);">https://lingbuzz.net/lingbuzz/008548</a></u></span>;
a more detailed presentation of tokenization in “ChatGPT and linguistic theory, with a focus on morphology”, available at:
<span style="color: rgb(17, 85, 204);"><u><a href="https://ling.auf.net/lingbuzz/008600" id="OWAb9ce9a22-9068-4d28-d2b9-67820e50e28c" class="OWAAutoLink" data-auth="NotApplicable" style="color: rgb(17, 85, 204);">https://ling.auf.net/lingbuzz/008600</a></u></span>.
Both papers are submitted for inclusion in edited collections and are currently under review. </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Finally, I would like to thank Arnold Zwicky for the help with my L1 English query and Eve Clark for the response. </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Best, </div>
<div class="elementToProof" style="direction: ltr; text-align: justify; line-height: 1.2; margin-top: 0pt; margin-bottom: 0pt; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Stela</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" id="Signature">
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><b>*********************</b></span></div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 14pt; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><b> Dr. Stela MANOVA</b></span></div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><img id="image_0" width="155" height="106" size="27370" contenttype="image/png" style="width: 155px; height: 106px; max-width: 506px; margin: 0px;" data-outlook-trace="F:1|T:1" src="cid:8f89f40b-cbd8-45a1-8bba-c90f5e567169"></span></div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Email addresses: </div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 14.6667px; color: rgb(12, 100, 192);">
<span style="background-color: rgb(255, 255, 255);">manova@gaussaiglobal.com</span></div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(12, 100, 192);">
manova.stela@gmail.com </div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Websites: </div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 14.6667px; color: rgb(12, 100, 192);">
<a href="https://www.gaussaiglobal.com" id="OWAfa6a70ef-c884-b9dd-38fc-8040d166f5ad" class="OWAAutoLink" data-auth="NotApplicable" style="color: rgb(12, 100, 192); margin: 0px;">https://www.gaussaiglobal.com</a></div>
<div style="text-align: left; text-indent: 0px; background-color: rgb(255, 255, 255); margin: 0cm; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 14.6667px; color: rgb(12, 100, 192);">
<a href="https://sites.google.com/view/stelamanova" id="OWA11566dfc-0ddc-5758-f0a3-b84086a8c906" class="OWAAutoLink" data-auth="NotApplicable" style="color: rgb(12, 100, 192);">https://sites.google.com/view/stelamanova</a></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display: inline-block; width: 98%;">
<div dir="ltr" id="divRplyFwdMsg"><span style="font-family: Calibri, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> Arnold Zwicky <arnold.zwicky@gmail.com><br>
<b>Sent:</b> Saturday, August 10, 2024 12:25 AM<br>
<b>To:</b> Stela Manova <manova.stela@gmail.com>; Linguistic Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject:</b> Re: [Lingtyp] L1 English</span>
<div> </div>
</div>
<div style="font-size: 11pt;"><br>
<br>
> On Aug 9, 2024, at 12:21 PM, Stela Manova via Lingtyp <lingtyp@listserv.linguistlist.org> wrote:<br>
><br>
> I am writing to ask for your help with the following issue. I have been working on the relationship between LLMs and linguistic theory and now it is time to check how language acquisition happens in humans and machines. I am therefore looking for the very
first, at least, 100 words of children with L1 English (in the ideal case, data should come from different varieties of the language). ... Please feel free to forward the query to linguists that are not on the list.<br>
<br>
I appealed to my old friend and illustrious colleague Eve Clark, and here are her musings, for you to use as you wish. Please don't engage with me on this (this is not at all my field) or with EC (who just gave a quick reaction).<br>
<br>
,,,,,<br>
<br>
We know a lot about the first 100+ words children produce (their early comprehension isn’t as well documented) but parental reports can be somewhat unreliable at times. All this is in the CHILDES Archive, of course, with many further analyses of the CDI data
also in Michael Frank’s recent book:<br>
<br>
Frank, M. C., Braginsky, Mika; Yurovsky, Daniel, & Marchman, Virginia A. 2021. Variability and Consistency in Early Language Learning: The Wordbank project. Cambridge, MA: MIT Press.<br>
<br>
Another source I’d recommend is an article by Warstadt & Bowman:<br>
<br>
Warstadt, Alex, & Bowman, Samuel R. 2022. What artificial neural networks can tell us about human language acquisition. In S. Lappin & J.-P. Bernardy (eds.), Algebraic Structures in Natural Language (pp. 17-29). London: CRC Press.<br>
(They compare the actual amounts of input required for different models to learn from, compared to human infants…)<br>
<br>
My general impression is that no-one in the LLM ‘field’ knows anything about human language acquisition, certainly not the details for early perception, phonological sequences, word identification/recognition, word combination, speech acts, etc.etc. For a recent
taste of this, see the fourth edition of<br>
Clark, E. V. 2024. First Language Acquisition (Cambridge UP) for all that will need to be accounted for…<br>
.....<br>
<br>
Arnold<br>
<br>
<br>
</div>
</body>
</html>