[Lingtyp] Modeling Language Without Language – A Terminology Wake-Up Call

Stela Manova manova.stela at gmail.com
Thu May 1 15:05:31 UTC 2025


Something unbelievable happened to me: I wanted to write a short note about the use of the term token in linguistics and computer science, and to post it here. But then it turned out that far more needed to be said—so here it is: the long version of my terminology wake-up call.



Modeling Language Without Language: A ChatGPT Lesson for Language Research

(Proofread and edited using ChatGPT)



ChatGPT enthusiastically proposed the following announcement:



What does ChatGPT actually “see”? Not words. Not morphemes. Just tokens.

This paper explores how linguistic misreadings of CS terms like token and tree have shaped (and distorted) research on LLMs—and why that matters for usage-based and generative linguistics alike.

📌 Linguists, beware: a token is not a word.

📎 PDF / preprint: https://ling.auf.net/lingbuzz/008998


#ChatGPT #LLMs #linguistics #computationalLinguistics #terminology #tokenization #generativeGrammar #usageBasedLinguistics

—

All the best, 

Stela

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20250501/56982021/attachment.htm>


More information about the Lingtyp mailing list