<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Dear all — There’s quite a bit of distortion here of what I said, unintentionally I’m sure. Still, I feel I need to clarify:</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
First off, I didn’t say “all semantic typology research must use primary data”. I said
<i>most</i> does. </div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Secondly, Bill says "Documentary/descriptive linguists do not just 'abstract away from individual speakers and attribute certain properties to entire linguistic varieties and speech communities'. Their descriptions are based on "primary data", and frequently
describe variation, contexts of use, interactional phenomena, social attitudes, socially governed differences in language behavior, etc. These are valuable generalizations about a language as a community entity.” However, I did not in any way, shape, or form
suggest that description is based on, or even aims to produce, secondary data. I didn’t in fact comment on practices of language description/documentation
<i>at all. </i>What I said is that secondary data typology <i>uses</i> results of language descriptions as secondary data.</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
And lastly, I have no idea where Bill is taking this from:</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div dir="ltr"><span style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">"There seems to be a purist view here that some data is perfect ("perfectly natural", "perfectly controlled",
or whatever), and other data is so flawed as to be useless (see Juergen’s 25 Nov email below)</span>”</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><br>
</span></div>
<div dir="ltr"><span style="background-color: rgb(255, 255, 255);">I didn’t use the words “perfect” and “flawed” at all. What I was commenting on is that secondary data typology, by virtue of reducing entire languages to single observations, ignores vast amounts
of information about them. In the past, this was inevitable because there was no reasonable alternative. This is now slowly changing, largely as a result of technological advancements. So, as a result, if we can do better, we will, unless you expect science
to stagnate or backslide. At the same time, I’m sure secondary-data typology will remain an important part of the toolkit, particularly as a means of aggregating
</span>primary data.</div>
<div dir="ltr"><br>
</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);">Reducing languages to single observations was until now a necessary idealization, as happens in the history of science over and over again. Consider for example grammaticality judgments: unit recently, syntacticians
were basing their generalizations on categorizing sentences dichotomously as grammatical or ungrammatical. Now the field is slowly changing to open itself up to more nuanced evidence from psycholinguistics and corpus linguistics. I see the role of primary
data in typology as a rather close analogy to that.</span></div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><br>
</span></div>
<div dir="ltr"><span style="background-color: rgb(255, 255, 255);">Best — Juergen </span></div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><br>
</span></div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="ms-outlook-mobile-signature">
<p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt; color: black;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="dda09e6a-2e5c-4d07-a117-6cc8339fd841" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt; color: black;"><br>
Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="b02f2809-724a-47de-911f-0edecc9e123f" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt; color: black;"> <br>
<br>
</span><span style="color: black;">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica; font-size: 9pt; color: black;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></p>
<p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </p>
<p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>
</div>
<div id="mail-editor-reference-message-container">
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"></div>
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt; color: black;">
<b>From: </b>William Croft <wacroft@icloud.com><br>
<b>Date: </b>Monday, November 24, 2025 at 11:52<br>
<b>To: </b>Juergen Bohnemeyer <jb77@buffalo.edu>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>
<br>
</div>
<table class="ms-outlook-mobile-reference-message skipProofing" cellspacing="0" cellpadding="0" style="text-indent: revert; line-height: revert; white-space: revert; background-color: revert; display: table; margin: revert; width: 100%; height: revert; table-layout: fixed; color: revert; box-sizing: border-box; border-collapse: collapse; border-spacing: 0px;">
<tbody>
<tr style="background-color: revert;">
<td class="ms-outlook-mobile-reference-message skipProofing" style="text-indent: revert; line-height: revert; white-space: revert; border-width: revert; border-style: revert; border-color: revert; background-color: rgb(166, 166, 166); padding: 7px 2px; word-break: revert; color: revert; width: 0px; height: revert;">
</td>
<td class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: revert; line-height: revert; white-space: revert; border-width: revert; border-style: revert; border-color: revert; background-color: rgb(234, 234, 234); padding: 7px 5px 7px 15px; word-break: revert; color: rgb(33, 33, 33); width: 100%; height: revert;">
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: revert; line-height: revert; white-space: revert; font-family: wf_segoe-ui_normal, "Segoe UI", "Segoe WP", Tahoma, Arial, sans-serif; color: revert;">
<span style="letter-spacing: revert; background-color: revert; line-height: revert;">You don't often get email from wacroft@icloud.com.
<a href="https://aka.ms/LearnAboutSenderIdentification" data-outlook-id="17e9f086-e0c3-4c2a-834c-c87b8e55d775" style="color: revert; display: revert; background-color: revert;">
Learn why this is important</a></span></div>
</td>
<td class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: revert; line-height: revert; white-space: revert; border-width: revert; border-style: revert; border-color: revert; background-color: rgb(234, 234, 234); padding: 7px 5px; word-break: revert; color: rgb(33, 33, 33); width: 75px; height: revert;">
</td>
</tr>
</tbody>
</table>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
Some comments on Juergen's email, starting from the end.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
It is not the case that all funding is directed towards "secondary data". There are quite a few sources for funding language documentation, not to mention sources for funding experimental psycholinguistics. My impression is that it is very difficult to obtain
funding for a typological project based solely on "secondary data", that is, data collected solely from descriptive materials.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
It is not the case that all semantic typological research must use "primary data" "out of sheer necessity". See for example the publications of Brown and Witkowski (e.g. Brown 1984), the Database of Semantic Shifts (https://datsemshift.ru/), or a paper from
a project I was involved with (Youn et al. 2016). Conversely, it is not the case that phonetic typology is entirely based on "secondary data"; see for example the experimental research described in Ladefoged and Maddieson (1996)<i>. </i></p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
Himmelmann (1998) has a more nuanced description of "primary data" and its relation to "secondary data". For example: 'There are generally three components to each document (piece of data), viz. the “raw” data in various forms of representation (transcription,
tape, and/or video), a translation (word-by-word/interlinear and free), and a commentary providing additional information as to recording circumstance, linguistic and cultural peculiarities associated with the data segment, comments by native speakers cooperating
in the transcription and translation of the segment, problems encountered in transcribing and translating, further data elicited in connection with the segment, etc. In short, everything that happened during recording, transcribing, and translating the data
(and eliciting, in the case of elicited data)' (pp. 169-170). Note that 'document (piece of data)' includes transcription, IMT and translation, the elements of a text corpus. The main problem with traditional text corpora is that they are incomplete: often
lacking audio or video, having only the minimal presence of the third component (the metadata captures only a fraction of it), and the restricted selection of discourse types (Himmelmann 1998:166ff). But they are not worthless. For many languages, it is all
that we have of any form of discourse.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
Finally, a language as a community entity is more than just a set of individual speakers' productions. There is a social dimension to language and language use (not to mention also a cognitive dimension of speaker intentions in social interactions involving
language). Documentary/descriptive linguists do not just 'abstract away from individual speakers and attribute certain properties to entire linguistic varieties and speech communities'. Their descriptions are based on "primary data", and frequently describe
variation, contexts of use, interactional phenomena, social attitudes, socially governed differences in language behavior, etc. These are valuable generalizations about a language as a community entity.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
There seems to be a purist view here that some data is perfect ("perfectly natural", "perfectly controlled", or whatever), and other data is so flawed as to be useless (see Juergen’s 25 Nov email below). No data is perfect, and all data is useful, even if it
must be taken with a grain of salt.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px;">
Bill</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 14px;">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: justify; text-indent: -18px; line-height: normal; margin: 0px 0px 0px 18px; font-family: "Times New Roman";">
Brown, Cecil H. 1984. <i>Language and Living Things</i>. Rutgers: Rutgers University Press.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 15px; font-family: "Times New Roman";">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: justify; text-indent: -18px; line-height: normal; margin: 0px 0px 0px 18px; font-family: "Times New Roman";">
Ladefoged, Peter & Ian Maddieson. 1996. <i>The sounds of the world’s languages.</i> Oxford: Basil Blackwell.</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; min-height: 15px; font-family: "Times New Roman";">
<br>
</p>
<p class="ms-outlook-mobile-reference-message skipProofing" style="line-height: normal; margin: 0px; font-family: "Times New Roman";">
Youn, Hyejin, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft and Tanmoy Bhattacharya. 2016. On the universal structure of human lexical semantics.
<i>Proceedings of the National Academy of Sciences </i>113(7).1766-71.</p>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>
</div>
<blockquote>
<div class="ms-outlook-mobile-reference-message skipProofing">On Nov 24, 2025, at 8:04 AM, Juergen Bohnemeyer via Lingtyp <lingtyp@listserv.linguistlist.org> wrote:</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Sorry, just to clarify further: by “generalizations over languages”, I didn’t mean typological generalizations; I meant descriptive statements about individual languages. Those are generalizations in the sense that they abstract away from individual speakers
and attribute certain properties to entire linguistic varieties or speech communities. That’s the nature of secondary data in my view. — Best — Juergen</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div id="ms-outlook-mobile-signature">
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="ae8627d5-c000-496e-8629-2b9a306f1640" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="dedd23ee-3870-4a8d-aa25-fea8da739134" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>
<br>
</span>Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) <span style="font-family: Helvetica; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div>
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div>
<p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>
</div>
<div id="mail-editor-reference-message-container">
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">
<b>From: </b>Lingtyp <lingtyp-bounces@listserv.linguistlist.org> on behalf of Juergen Bohnemeyer via Lingtyp <lingtyp@listserv.linguistlist.org><br>
<b>Date: </b>Monday, November 24, 2025 at 09:47<br>
<b>To: </b>Mira Ariel <mariel@tauex.tau.ac.il>, Martin Haspelmath <martin_haspelmath@eva.mpg.de>, Peter Arkadiev <peterarkadiev@yandex.ru>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Dear all — I’m treating as primary data anything that consists of the speech, or judgments (although those to me have a less “vivid” quality as data), of individual speakers (and analogously for sign language). As opposed to generalizations over languages —
that’s what I mean by secondary data. I’m well aware that corpus data has an in-between status. Perhaps rather than to say that
<i>is </i>primary data, it would be more appropriate to say that it can be used, to some extent, like primary data.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Primary data can be the result of spontaneous observation, can consist of recordings of what Himmelmann (1998) calls ‘staged’ discourses, and can be elicited or collected experimentally. I see experimentation and elicitation as cluster concepts that form a
multidimensional continuum (as discussed in my upcoming book on <i>Semantic research: From data to analysis</i>, due out with CUP in January). </div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Today, almost all of morphosyntactic typology and the bulk of phonetic typology is based on secondary data. In contrast, semantic typology (my primary focus) mostly utilizes primary data, out of sheer necessity, since secondary data is not available. </div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
And to respond to Martin: I really didn’t mean to suggest that we drop secondary data typology right this minute ;-) (I’m actually myself up to my ears in Grambank data these days.) What I’m envisioning is a gradual shift in emphasis over the next couple of
decades, especially when it comes to megaprojects (by typological standards) such as Grambank. Creating the resources needed to get us into striking distance for primary data typology on grammar will require a vast effort, so at some point, typologists and
funders will have to make decisions on which basket they want to place those big eggs (sorry, mixing metaphors again) in, continuing to pour everything into the secondary data basket or gradually shifting emphasis toward funding more primary data projects.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Best — Juergen</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Himmelmann, N. P. (1998). Documentary and descriptive linguistics. <i>Linguistics</i> 36:161-195.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div id="ms-outlook-mobile-signature">
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="fbed1943-a80a-495d-a92a-abbb84b969bd" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="410598d8-79df-46a9-a640-8dff244af1ae" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>
<br>
</span>Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) <span style="font-family: Helvetica; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div>
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div>
<p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>
</div>
<div id="mail-editor-reference-message-container">
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="margin-right: 0in; margin-left: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
</div>
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; margin-right: 0in; margin-left: 0in; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">
<b>From: </b>Mira Ariel <mariel@tauex.tau.ac.il><br>
<b>Date: </b>Monday, November 24, 2025 at 09:14<br>
<b>To: </b>Martin Haspelmath <martin_haspelmath@eva.mpg.de>, Juergen Bohnemeyer <jb77@buffalo.edu>, Peter Arkadiev <peterarkadiev@yandex.ru>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject: </b>RE: [Lingtyp] Reporting cross-linguistic frequencies<br>
<br>
</div>
<div style="margin: 0in 0px; font-family: "Times New Roman", serif; font-size: 12pt; color: rgb(10, 47, 65);">
Hi,</div>
<p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p>
<div style="margin: 0in 0px; font-family: "Times New Roman", serif; font-size: 12pt; color: rgb(10, 47, 65);">
I’m not a typologist, but in semantics/pragmatics research a similar dilemma arises: Corpus data or experimental data? My experience has been that although both have flaws, both can advance our understanding of language. We should just give up on the idea that
we could find the one perfect methodology. That said, there’s plenty of room to criticize what one thinks is a flawed methodology, of course.</div>
<p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p>
<div style="margin: 0in 0px; font-family: "Times New Roman", serif; font-size: 12pt; color: rgb(10, 47, 65);">
Best,</div>
<div style="margin: 0in 0px; font-family: "Times New Roman", serif; font-size: 12pt; color: rgb(10, 47, 65);">
Mira</div>
<p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p>
<div style="margin-right: 0in; margin-left: 0in; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(225, 225, 225) currentcolor currentcolor;">
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><b>From:</b> Lingtyp <lingtyp-bounces@listserv.linguistlist.org>
<b>On Behalf Of </b>Martin Haspelmath via Lingtyp<br>
<b>Sent:</b> Sunday, November 23, 2025 11:56 PM<br>
<b>To:</b> Juergen Bohnemeyer <jb77@buffalo.edu>; Peter Arkadiev <peterarkadiev@yandex.ru>; Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject:</b> Re: [Lingtyp] Reporting cross-linguistic frequencies</div>
</div>
<p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">I agree with Peter that the corpus-based methods employed by Hawkins, Wälchli, Cysouw, Levshina and others have been very important,
and also with Jürgen that "when confronting the causal inference problem in typology, we must consider every source of evidence that we can get our hands on."</span></p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">But I don't agree with Peter that "the whole enterprise [of overcoming genealogical and areal biases] does not appear to be very
productive", and I don't agree with Jürgen that we "must eventually move from secondary data typology to primary data typology".</span></p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">I think that the enterprise of controlling for family and contact effects is absolutely necessary, because otherwise we cannot
distinguish outcomes of universal/non-historical factors from outcomes of historical events. Peter recognizes this implicitly when he says that we should "combine experimental research ... with a quantitative study of variation in corpora across a small number
of sufficiently distinct languages". That's precisely the point: Which languages are "sufficiently distinct"? And hasn't the search for empirical universals been *highly productive* over the last few decades? The recent paper by Verkerk et al. (2025) has found
good evidence for most of the empirical universals that had been seriously discussed earlier, so the Greenbergian universals seem to very robust findings compared to many other prestigious claims in linguistics.</span></p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">And I think that there is no reason to abandon secondary-data typology just because we can also (increasingly) do primary-data
typology. Typological comparison can be done at multiple scales and multiple levels of granularity, and it is not clear that we can dispense with any of these levels. For example, we want to do typology of phonological segments (along the lines of the Phoible.org
database), or typology of word meanings (lexification typology, cf. </span><span style="font-family: Aptos, sans-serif; font-size: 12pt; color: blue;"><u><a href="https://clics.clld.org/" originalsrc="https://clics.clld.org/" data-outlook-id="c1d3d11d-7030-4493-b067-a2fb603bd8e4" style="color: blue; margin-top: 0px; margin-bottom: 0px;">https://clics.clld.org/</a></u></span><span style="font-family: Aptos, sans-serif; font-size: 12pt;">),
and for these, it seems that secondary data will not be easily replaced.</span></p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">Best,</span></p>
<p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">Martin</span></p>
<div style="margin-right: 0in; margin-left: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</div>
<div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">On 21.11.25 16:04, Juergen Bohnemeyer wrote:</div>
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;">
<div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">Dear Peter — I’m a massive fan of corpus-based typology. More broadly, there is no question in my mind that we should, and must, eventually move from secondary data typology to primary
data typology. Nobody seems to deny that secondary data typology is fraught with too many problematic idealizations: in particular, it reduces entire languages to single observations, and it suffers from incomparable decisions on what is treated as a language
in different parts of the world. </div>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">(The second problem is closely related to, but not entirely identical with, the countability problem Ian Joo mentions. The fact that
<i>language</i> is a count noun is a powerful illustration of how ordinary language can frame reality in ways that may impede scientific progress if it goes unchecked, as Whorf pointed out. However, actually counting languages is not the issue for regression-based
modeling, since regression models don’t operate on counts. But the question whether what is treated as an observation (i.e., a language) is uniform across the sample is of course very much a concern for the validity of sampling-based and regression-based modeling
alike.)</div>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">There is a broader answer to your question, though: as a matter of course, when confronting the causal inference problem in typology (i.e., when hunting for the causal forces that shape
languages), we must consider every source of evidence that we can get our hands on. Aside from corpus-based typology, this includes field-based psycholinguistics and the toolkit of evolutionary linguistics, including simulations and miniature artificial language
experiments. </div>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">Let me also suggest a distinction between methods that are primarily geared toward the discovery of typological distributions and the examination of their statistical properties and
methods than can be used to test hypotheses of causal inference (i.e., explanatory hypotheses). Experimental research such as what I just mentioned has its uses primarily for testing explanatory hypotheses. Corpus-based research can have both functions. But
if we want to use corpora to discover typological distributions, we’ll need very large parallax corpus databases. As are being developed now. </div>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">Best — Juergen</div>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
</p>
<div id="ms-outlook-mobile-signature">
<div style="margin: 0in;"><span style="font-family: Helvetica, sans-serif; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="7643e63c-8bfc-41aa-8852-7b0b541c2585" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="85338293-d5de-4c84-86fe-b66be2c1ed5e" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"> <br>
<br>
</span><span style="font-family: Calibri, sans-serif; font-size: 11pt;">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div>
<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>
</div>
</div>
</body>
</html>