[Lingtyp] On AI, Language, and (In)Human Thinking
Stela MANOVA
manova.stela at gmail.com
Sun Nov 16 14:36:09 UTC 2025
Dear colleagues,
Although the task to examine the terms backformation and borrowed morphology is much easier than finding a real-world activity in which form and meaning can be separated yet restored to one another, there have been no answers this time. Perhaps this is because examining terminology might be perceived as potential criticism of work by well-known scholars working on the topic, i.e. some may see it as risky for their careers.
Since I am Bulgarian and, according to the linguistic logic here in Vienna, I do not have even a hypothetical chance for a linguistic career, I think I can afford to reply.
Backformation
Imagine you buy a microwave, take it home, and then “back-form” it by removing, let me say, the door — because the microwave looks very similar to an oven, and there are grills that look like ovens without doors (think of the removal of an existing affix in backformation, e.g., of -or from editor, to form the verb to edit; the direction of the derivation asserts that editor is diachronically older). The final result will be a useless microwave. In other words: backformation = destruction. Why, then, is backformation considered a legitimate word-formation process in linguistics? What is the cognitive explanation of such an operation?
ChatGPT’s “solution”:
After entering the model, forms can only grow (think of iconicity or Natural Morphology; and yes, Wolfgang U. Dressler was my PhD advisor — and he, like ChatGPT and me, was against backformation). Using the classical example of backformation just mentioned, editor → edit: for ChatGPT, both edit (token ID 9204) and editor (token ID 9836) are single tokens (https://platform.openai.com/tokenizer, GPT-4o). Interestingly, editor has the higher ID number (9836 > 9204), meaning it entered the system later. Token IDs are assigned based on frequency of use: the more frequent the item, the smaller the ID number. The form edit is more frequent because it occurs both on its own and as part of editor. So there is no destruction — only growth, i.e. reuse of shorter forms, cf. Manova 2011, Understanding Morphological Rules (Studies in Morphology 1, Springer, https://link.springer.com/book/10.1007/978-90-481-9547-3, with a foreword by W. U. Dressler). But who reads a female Bulgarian?
Borrowed morphology
Borrowing does not actually exist, because the borrowed element remains fully in the possession of the donor language. Thus, it is not “borrowing” in the strict sense; the item can still be used freely in the donor language and may behave differently in the recipient language.
ChatGPT’s “solution”:
A very simple example would be OK. In English, OK is a single token (ID 11339). In Bulgarian (in Cyrillic script), OK is two separate tokens, “O” and “K” (IDs 4663 and 3682, respectively). Is this the same item in the two languages, or a different one in Bulgarian? Think of tokenization as a kind of psycholinguistic processing. Could it be that treating a borrowed item as the same in the donor and the recipient language is, at the very least, misleading?
General point
If we assume that native material (in the donor language) and borrowed material (in the recipient language) are the same substance — as has been done for morphological borrowing in contact linguistics so far — this would imply that information about the origin (including age, i.e. diachrony) of the building blocks does not play a role in language. If this is so, we may further ask: What does research in backformation, borrowing, and historical linguistics in general contribute to our understanding of language?
ChatGPT does not collect diachronic information; it periodically updates the whole system (think of the update of a computer operating system — OS). Put differently, the stored files (e.g., knowledge from books) remain intact and new knowledge can be added, but this type of knowledge does not command the system; the OS does. However, the old OS is deleted (not edited) and replaced with a new one. In this way, ChatGPT — and computers in general — always work only at a synchronic level and do not need diachronic information. Ignoring diachrony has an immense impact on complexity — it immensely optimizes processing and brings many other advantages, including such related to the security and stability of the system. In other words, there is a very simple way for linguists to significantly increase the efficiency of linguistic theory: they just need to (1) always separate synchrony and diachrony (my PhD advisor, W. U. Dressler, always told me this, and that mixing synchrony and diachrony results in a salad) and (2) stop using historical linguistics for theory building (this point is my own — but I am sure that, having read my explanation, Dressler would agree with me). So, an update of the OS would do the job.
Additionally, since in ChatGPT all languages are situated in a shared representational space, linguistic borrowing occurs naturally (even between languages using different scripts, as in the Bulgarian–English example above). Thus, no control for a feature “±borrowed” is necessary, and each language keeps its identity, i.e. it remains itself.
Other things could also be said, but let me stop here. Maybe I should only add that when I wrote earlier that “a math-gifted person often hears…”, I should have specified that in the math-gifted community, math-gifted people teach math-gifted people. As for “listen to your belly”: the intuition of math-gifted persons is so strong (this coordination of belly and head) that they do not need to train it — they need to be careful not to ruin it, because the educational system can spoil this intuition. Then, math-gifted persons think very quickly and they just know; they do not go consciously through all the steps I made explicit in this and previous messages.
I explained the logic in steps (cf. the step-by-step “reasoning” of ChatGPT — trained by math-gifted people!) to draw your attention to the fact that mathematics is not an abstract science. At least, it is less abstract than theoretical linguistics. Mathematics uses more complex analogies than linguistics, i.e. in mathematics the brain is allowed to fly. In linguistics the brain is restricted through theoretical assumptions and extralinguistic factors (e.g., fame or seniority), which often makes solutions inefficient or even absurd, as we could see in the cases of backformation and borrowed morphology above.
Best,
Stela
PS: The call for participation and 5-min talks in my workshops will be sent out on Monday, November 17 to different lists, including LinguistList, and posted on Gauss:AI Global and MANOVA AI websites. Registration will open on December 1, 2025. Tickets are limited. December 1–31, 2025 — early birds. I am mentioning this because the workshop series announcements have not been banned by the LingTyp list moderator. Thank you, colleagues, for your understanding!
> On 13.11.2025, at 17:46, Stela MANOVA <manova.stela at gmail.com> wrote:
>
> Dear all,
>
> Several colleagues wrote to me privately in response to the task, and I would like to thank you all for your engagement. Quite a few suggested crossword or rebus solving/constructing as a possible answer. It seems that crosswords / rebuses resonate intuitively with linguists when thinking about form–meaning relations.
>
> One of the private messages made me particularly happy: it included an additional step, namely further checking my solution against a real-life situation — something a mathematically trained person would do, too. I will not go into detail here, but I appreciate that colleague’s thinking very much; it showed the kind of independent reasoning I had hoped the task would inspire. Overall, I was surprised by the colleagues who wrote to me privately: famous scholars and shy!
>
> Now, to make the math-based approach I revealed to you more directly applicable to linguistic analysis, let us try something new, namely evaluating linguistic terminology. In other words, using parallels from real-world activities, we will now examine linguistic terms. I suggest backformation and borrowed morphology as an easy (and fun) starting point. Note that ChatGPT does not have both.
>
> Let us give ourselves three days this time, i.e. the time limit ends on November 16 (Sunday) at the exact time when this message is sent. But unlike last time, I would like to reward quick thinkers — just like at a serious math event — so feel free to post your solutions the moment they occur to you. I am looking forward to seeing what comes first, and I hope this round will be livelier publicly!
>
> Best,
>
> Stela
>
>
>
>> On 11.11.2025, at 22:01, Stela MANOVA <manova.stela at gmail.com> wrote:
>>
>> Hello everyone,
>>
>> The time limit for problem solving is over. I hope you enjoyed the process of looking for a solution. As promised, here is my answer.
>>
>> An activity in the real world that clearly separates meaning and form is film recording. In filmmaking, picture and sound are recorded on separate tracks and can exist independently. The picture track corresponds to meaning (think of a silent film), while the sound or written script corresponds to form (think of radio broadcasting or a play script, respectively). Picture and sound/script can be stored on their own data carriers and later synchronized. Thus, the goal here is not to deny the interconnection of form and meaning, but to show that their (temporary) separation and later synchronization are familiar operations for the human brain.
>>
>> Thanks to the mediation of the human brain, even before synchronization, we can produce sound or text based on the picture, and conversely, we can reconstruct the picture based on sound or text. However, the two operations are not the same, and reconstruction based on form appears more stable and precise, while reconstruction based on meaning is less so. I think this explains why listening to the radio and reading a text are normal activities, whereas silent films are no longer produced. In other words, with language generation based on form, the ChatGPT creators made the perfect choice.
>>
>> At this point, a math-gifted person will start looking for similar phenomena in other scientific fields to ensure that this is indeed the right analogy to the separation of meaning and form. Analogous examples do exist — I could find parallels in mathematics, physics, and biology. Intriguingly, each scientific field has its own version of the phenomenon. But I am not sure whether it is a good idea to mention all the examples here; the amount of information could feel overwhelming for untrained brains (recall that my brain was trained for ten years, and I am math-gifted, which further facilitates analogizing and problem solving for me).
>>
>> Thus, let me stop here and see your answers now. I am very excited! Please introduce your findings in the way I have done above and explain why you believe that your examples show a separation of meaning and form and allow for reconstruction of each aspect from the other. Maybe we should wait for a few posts and then start the discussion.
>>
>> Looking forward to your examples and explanations.
>>
>> Best,
>>
>> Stela
>>
>>
>>
>>
>>> On 10.11.2025, at 03:14, Anna Alexandrova <anna.alexandrova at uniroma1.it> wrote:
>>>
>>> Dear Stela,
>>>
>>> Thank you for this Big Yus Ѫ energy moment (is енергия на голям юс the correct wording in Bulgarian?). I suspect many of us are a bit exhausted for a challenge at the moment, though.
>>>
>>> We did enjoy our math and coding classes back in school, even if we might remember them less intensely than you do.
>>>
>>> Best regards,
>>>
>>> A.
>>>
>>>
>>> Il giorno dom 9 nov 2025 alle ore 22:18 Stela MANOVA via Lingtyp <lingtyp at listserv.linguistlist.org <mailto:lingtyp at listserv.linguistlist.org>> ha scritto:
>>>> Dear colleagues,
>>>>
>>>> It turned out that the topic of LLMs is exactly like the topic of language — everybody feels competent, irrespective of their qualifications. In what follows, I would like to address some misinformation that appeared in relation to AI in recent messages on this list: that LLMs are designed to generate costs and therefore end virtually every answer with a question, and that they are “dangerous” because they operate in an entirely inhuman way, based only on form.
>>>>
>>>> As many of you know, my work focuses on ChatGPT, so I will use it in my examples:
>>>>
>>>> ChatGPT knows many things but cannot start a conversation. It needs prompts — i.e., contextual anchors — to select the next token. This is why it often ends answers with a question: not because it “thinks of money,” but because it seeks additional input. The larger the input, the better the answer. Note that the fact that AI is reactive, not proactive, places the human in control of the machine.
>>>>
>>>> LLMs are not mere text collections but a triumph of human intelligence. You do not believe that if you have the whole Internet in text format, this huge amount of text will start speaking like a human by itself. Behind LLMs lies immense conceptual and mathematical work — all done by mathematically gifted humans. I describe the mindset of such people in my paper https://ling.auf.net/lingbuzz/008998 <https://ling.auf.net/lingbuzz/008998>, including how the creators of ChatGPT arrived at the idea of representing language as a linear sequence of tokens. The idea to put all languages in a shared representational space is equally remarkable: that way they get everything (grammar, semantics, typology, etc.) for free, so to speak; data self-classify, and they can work even with pieces of data and in many languages simultaneously. Compare this with the linguistic approach: each language is described separately; we compare data only when (complete) language descriptions are ready, and the transfer of classificational features from language to language is not always obvious or smooth.
>>>>
>>>> As those of you who have read the paper mentioned above know, I was educated as a math-gifted student for ten years. What I do not mention in the paper is that my brain was trained for mathematical thinking at least five hours a day, every day except Sundays — yes, for ten years — to form the necessary neural connections. And yes, I learned university mathematics in my teens. A mathematically gifted child often hears three things:
>>>>
>>>> i) All problems are already solved in the real world; you only need to find the right analogy.
>>>>
>>>> ii) It is the belly that knows, not the head. Listen to your belly!
>>>>
>>>> iii) All problems have more than one solution.
>>>>
>>>> Allow me now to propose a short experiment related to the alleged “inhuman” way LLMs treat language — namely, by separating meaning and form. (By “separation” I mean only that the two aspects can be processed or represented independently for a while. By the mediation of the human brain, they both then appear meaningful.) The goal of the experiment is to make you experience the way of thinking of a math-gifted person and to demonstrate that there is an analogy to the ChatGPT approach to language in the real world.
>>>>
>>>> So, the task: Can you find an activity in the real world where form and meaning are separated, yet each can be reconstructed from the other? If you can, the separation of meaning and form is not alien to the human brain.
>>>>
>>>> Following the spirit of mathematical training — where there is always a time limit, including a period for full concentration (with a “no restroom” rule) — I propose the following:
>>>>
>>>> It is now November 9, 22:00 CET, and I am giving you two days (until November 11, 22:00 CET) to find the activity described in the task (it could be from any sphere of human life). During this period, I kindly ask that no messages be posted in this thread, so that we can all focus on the task (the “no restroom” rule does not apply).
>>>>
>>>> After the time limit elapses, we will collect our findings and discuss them. I have already solved the task (and mentioned the solution in an exchange with a well-known neurolinguist — I hope this does not spoil the experiment). On the third day, I will share my answer and look forward to hearing yours.
>>>>
>>>> I hope this demonstration will bring you closer to the thinking behind ChatGPT and show that there is nothing “dangerous” in the model — only a new, fascinating way of representing language grounded in the real world. (Think of the shared representational space as the Earth where all humans live.)
>>>>
>>>> Best wishes,
>>>>
>>>> Stela / Gauss:AI Global
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Lingtyp mailing list
>>>> Lingtyp at listserv.linguistlist.org <mailto:Lingtyp at listserv.linguistlist.org>
>>>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>>>
>>> Fai crescere le giovani ricercatrici e i giovani ricercatori
>>> con il 5 per mille alla Sapienza
>>> Scrivi il codice fiscale dell'Università 80209930587
>>> Cinque per mille <https://www.uniroma1.it/it/node/23149>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251116/0af3d9c7/attachment.htm>
More information about the Lingtyp
mailing list