[Lingtyp] "AI" and linguistics problem sets
Chao Li
chao.li at aya.yale.edu
Tue Nov 11 13:56:34 UTC 2025
Dear Mark and All,
I agree that generative A.I. has become so powerful that instructors often
cannot tell for sure if the student’s work is original/authentic or
generated by A.I. (and modified by the student). With respect to
“home”work, I nowadays often find myself spending so much valuable time
determining if the student’s work is authentic and to what extent, and this
often makes grading a painful and time-consuming experience. Due to these
considerations, I agree to tech-free assessments mentioned by you and many
others. Related to this, I’d like to share an opinion piece by Anastasia
Berg, who also advocates “[c]reating tech-free spaces and incentivizing
students to spend time in them”.
Best regards,
Chao
On Mon, Nov 10, 2025 at 11:56 PM Mark Post via Lingtyp <
lingtyp at listserv.linguistlist.org> wrote:
> Dear Juergen/All,
>
> First, thanks very much to you and others who have responded both on-list
> and off so far - it has been fascinating to find that there are such
> different perspectives on this issue, ranging from “my students simply
> don’t use AIs to do their work, so I see no problem with continuing to use
> problem sets in assessments” to “I know that my students will use AIs on
> problem sets no matter what I tell them, so I only ever give locked-down
> exams”. And of course, quite a few people hoping - like me - for some sort
> of middle ground.
>
> I want to respond to Juergen’s question now because I’m afraid that the
> discussion is getting a bit derailed by thoughts about the nature/features
> of LLMs and what they are or aren’t doing with text-based language data,
> which - while interesting - is really *not* what I wanted to focus on
> (though I would love to follow a discussion along those lines in another
> thread).
>
> To try to refocus:
>
>
> 1. I’m asking specifically about long-form, *dataset-based problems*,
> of the type found here:
> https://pages.uoregon.edu/tpayne/problem_sets.htm. The reason why I
> want to focus on them, rather than on some other method of assessment, is
> because they can be effectively designed to mimic many aspects of
> a descriptive linguist’s work process (organising large-ish sets of data,
> identifying gaps, performing distributional analysis, testing different
> hypotheses, and - yes - reasoning abductively). Depending on various
> factors, “solving” a dataset-based problem can take a lot of time - and so
> are traditionally, in my experience, presented as “homework”. Doing so has
> generally been a good thing, because it gives students the time that most
> people need to really sit down and work through data, and therefore gives
> them/us the opportunity to learn concepts and develop skills through
> extended application and practice (typically more effective than a
> studied-for and timed exam). For decades already, there has been a sort of
> arms race against the internet, as good answers for most standard problem
> sets have long been posted (somewhere…), but through substitution/cloaking
> (“Language X” “some features may have been altered”...), and/or use of
> private/unpublished datasets, I trust that most people, like me, have been
> managing to cope. Until now, when it has become possible to use chatbots to
> develop solutions to problems for which no human-generated solution exists
> anywhere on the internet...
>
>
>
> 2. I’m asking specifically about their use as/in * assessment*, i.e.
> giving some students more points for a “better” solution than others, and
> having this contribute to their grade for the class. Leaving aside any
> pedagogical or ethical concerns around assessment, student performance
> ranking, etc., it seems to be a fact that - for better or for worse - most
> universities, and their government accrediting bodies, are going continue
> to demand this of their linguistics teachers. So the question is whether
> and how problem sets (or anything similar) can be used in/as assessments,
> in view of the following:
>
>
>
> 3. LLMs are simply getting much better at solving dataset-based
> problems of almost any type, and given the performance improvements over
> the past 12 months or so, I think we need to assume that they will keep
> getting better. Last year at this time ChatGPT was terrible at working with
> Salishan languages; you couldn’t even get it to find the verb root, it
> would just hack away at the stem phonology like a lunatic. Today, it’s not
> perfect, but it is definitely much, much better. The likely reasons for
> this are potentially interesting, but in a sense irrelevant. The question
> is whether there is *any* type of human-solvable dataset-based problem
> that an LLM *must* (by definition, in some sense) continue to find *completely
> intractable*, and I think we *have to* assume that the practical
> answer, as far as a linguistics teacher/class is concerned, can only be
> “no”. [And that’s leaving aside my earlier observation that we probably
> don’t want to be assessing students exclusively on their performance at
> analysing the small set of putatively “LLM-proof” languages that may for
> some reason exist in the world.]
>
>
>
> 4. The question in this context is not really whether LLMs can or
> can’t, as a matter of principle, do the work of distributional analysis in
> the same way as/as well as a well-trained human linguist. I have my own
> semi-informed opinions about this, but the question here is whether they
> can do it *well enough* to enable Student A, who outsourced their
> assignment completely to an LLM, to get a mark that is *not much worse* than
> that of Student B, who spent their entire weekend doing it by hand (yep -
> that is what we used to do…). Because if the answer is *ever* “yes” -
> and I would argue that, in *most* cases, it is *currently* “yes” -
> then this is no longer a useful or meaningful form of assessment (unless it
> is modified in some way, hence my question).
> 5.
> 6. There are all kinds of cool and interesting tricks for detecting AI
> use in circulation, and I have indeed used some of them (most recently,
> text in background-coloured font which, while invisible most users,
> instructs an LLM to insert a flagword in its response - much like the
> “watermark” of Juergen’s suggestion). However, two things on this point:
> one, it’s just as unrealistic to expect a linguistics teacher to stay one
> step ahead of their students in the latest AI-detection techniques as it is
> to expect them to know the current shortlist of putatively LLM-proof
> languages (probably it is even less so). The field just moves too quickly,
> and even the best detection method can only be used once and will only
> catch out the least tech-aware among one’s students - which is not really
> the point. Two, my sense is that many university administrations are
> adopting an adversarial posture
> <https://educational-innovation.sydney.edu.au/teaching@sydney/false-flags-and-broken-trust-can-we-tell-if-ai-has-been-used/> towards
> AI-detection by teachers. Why this should be the case is an interesting (if
> depressing) topic to contemplate, but it at least implies that teachers
> should not assume AI-detection to fall within their reasonable (and safe…)
> range of responsibilities.
>
>
> In sum: data sets as homework: pedagogically good in the absence of AI,
> unsound in the presence of AI. What can be done? If anything?
>
> One approach which we’re planning to test out here (and I’d be interested
> to hear anyone’s opinions about, if they’ve tried anything similar - I
> realise that there are potentially some institution-specific
> policy/logistical issues involved) is to move problem set “homework”
> assessments into the tutorial/section/classroom/laboratory context, and
> have students work in groups to complete the problem set which they then
> submit at the end of the hour. There are some obvious pros and cons to this
> approach, but right now we’re more or less setting the low bar of, well -
> trying to get at least some students to do any sort of work at all.
>
> Thanks again everyone
> Mark
>
>
> *From: *Juergen Bohnemeyer <jb77 at buffalo.edu>
> *Date: *Friday, 7 November 2025 at 4:06 am
> *To: *Mark Post <mark.post at sydney.edu.au>, typology list <
> lingtyp at listserv.linguistlist.org>
> *Subject: *Re: "AI" and linguistics problem sets
>
> Dear Mark — I’m actually surprised to hear that an AI bot is able to
> adequately solve your problem sets. My assumption, based on my own very
> limited experience with ChatGPT, has been that LMMs would perform so poorly
> at linguistic analysis that the results would dissuade students from trying
> again in the future. Would it be possible at all to share more details with
> us?
>
> (One recommendation I have, which I however haven’t actually tried out, is
> to put a watermark of sorts in your assignments, in the form of a factual
> detail about some lesser-studied language. Even though such engines are of
> course quite capable of information retrieval, their very nature seems to
> predispose them toward predicting the answer rather than to looking it up.
> With the results being likely straightforwardly false.)
>
> Best — Juergen
>
>
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
>
> Office: 642 Baldy Hall, UB North Campus
> Mailing address: 609 Baldy Hall, Buffalo, NY 14260
> Phone: (716) 645 0127
> Fax: (716) 645 3825
> Email: *jb77 at buffalo.edu <jb77 at buffalo.edu>*
> Web: *http://www.acsu.buffalo.edu/~jb77/
> <https://url.au.m.mimecastprotect.com/s/O6TNCWLVXkUXv4zK7I6flCoL6Cb?domain=acsu.buffalo.edu/>*
>
>
> Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585
> 520 2411; Passcode Hoorheh)
>
> There’s A Crack In Everything - That’s How The Light Gets In
> (Leonard Cohen)
>
> --
>
>
> *From: *Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf of
> Mark Post via Lingtyp <lingtyp at listserv.linguistlist.org>
> *Date: *Tuesday, November 4, 2025 at 18:27
> *To: *typology list <lingtyp at listserv.linguistlist.org>
> *Subject: *[Lingtyp] "AI" and linguistics problem sets
>
> Dear Listmembers,
>
> I trust that most lingtyp subscribers will have engaged with “problem
> sets” of the type found in Language Files, Describing Morphosyntax, and my
> personal favourite oldie-but-goodie the Source Book for Linguistics. Since
> the advent of ChatGPT, I’ve been migrating away from these (and even
> edited/obscured versions of them) for assessments, and relying more and
> more on private/unpublished data sets, mostly from languages with lots of
> complex morphology and less familiar category types, that LLMs seemed to
> have a much harder time with. This was not an ideal situation for many
> reasons, not least of which being that these were not the only types of
> languages students should get practice working with. But the problem really
> came to a head this year, when I found that perhaps most off-the-shelf LLMs
> were now able to solve almost all of my go-to problem sets to an at least
> reasonable degree, even after I obscured much of the data.
>
> Leaving aside issues around how LLMs work, what role(s) they can or should
> (not) play in linguistic research, etc., I’d like to ask if any listmembers
> would be willing to share their experiences, advice, etc., specifically in
> the area of student assessment in the teaching of linguistic data analysis,
> and in particular morphosyntax, in the unfolding AI-saturated environment.
> Is the “problem set” method of teaching distributional analysis
> irretrievably lost? Can it still be employed, and if so how? Are there
> different/better ways of teaching more or less the same skills?
>
> Note that I would really like to avoid doomsdayisms if possible here (“the
> skills traditionally taught to linguists have already been made obsolete by
> AIs, such that there’s no point in teaching them anymore” - an argument
> with which I am all-too-familiar), and focus, if possible, on *how* it is
> possible to assess/evaluate students’ performance *under the assumption* that
> there is at least some value in teaching at least some human beings how to
> do a distributional analysis “by hand” - such that they are actually able,
> for example, to evaluate a machine’s performance in analysing a
> new/unfamiliar data set, and under the further assumption that
> assessment/evaluation of student performance in at least many institutions
> will continue to follow existing models.
>
> Many thanks in advance!
> Mark
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251111/76f42a3a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Why Even Basic A.I. Use Is So Bad for Students ChatGPT generative AI 2025.pdf
Type: application/pdf
Size: 125226 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251111/76f42a3a/attachment-0001.pdf>
More information about the Lingtyp
mailing list