<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Some comments on Juergen's email, starting from the end.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">It is not the case that all funding is directed towards "secondary data". There are quite a few sources for funding language documentation, not to mention sources for funding experimental psycholinguistics. My impression is that it is very difficult to obtain funding for a typological project based solely on "secondary data", that is, data collected solely from descriptive materials.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">It is not the case that all semantic typological research must use "primary data" "out of sheer necessity". See for example the publications of Brown and Witkowski (e.g. Brown 1984), the Database of Semantic Shifts (https://datsemshift.ru/), or a paper from a project I was involved with (Youn et al. 2016). Conversely, it is not the case that phonetic typology is entirely based on "secondary data"; see for example the experimental research described in Ladefoged and Maddieson (1996)<i>. </i></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Himmelmann (1998) has a more nuanced description of "primary data" and its relation to "secondary data". For example: 'There are generally three components to each document (piece of data), viz. the “raw” data in various forms of representation (transcription, tape, and/or video), a translation (word-by-word/interlinear and free), and a commentary providing additional information as to recording circumstance, linguistic and cultural peculiarities associated with the data segment, comments by native speakers cooperating in the transcription and translation of the segment, problems encountered in transcribing and translating, further data elicited in connection with the segment, etc. In short, everything that happened during recording, transcribing, and translating the data (and eliciting, in the case of elicited data)' (pp. 169-170). Note that 'document (piece of data)' includes transcription, IMT and translation, the elements of a text corpus. The main problem with traditional text corpora is that they are incomplete: often lacking audio or video, having only the minimal presence of the third component (the metadata captures only a fraction of it), and the restricted selection of discourse types (Himmelmann 1998:166ff). But they are not worthless. For many languages, it is all that we have of any form of discourse.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Finally, a language as a community entity is more than just a set of individual speakers' productions. There is a social dimension to language and language use (not to mention also a cognitive dimension of speaker intentions in social interactions involving language). Documentary/descriptive linguists do not just 'abstract away from individual speakers and attribute certain properties to entire linguistic varieties and speech communities'. Their descriptions are based on "primary data", and frequently describe variation, contexts of use, interactional phenomena, social attitudes, socially governed differences in language behavior, etc. These are valuable generalizations about a language as a community entity.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">There seems to be a purist view here that some data is perfect ("perfectly natural", "perfectly controlled", or whatever), and other data is so flawed as to be useless (see Juergen’s 25 Nov email below). No data is perfect, and all data is useful, even if it must be taken with a grain of salt.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Bill</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 14px;"><br></p>
<p style="margin: 0px 0px 0px 18px; text-align: justify; text-indent: -18px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: "Times New Roman"; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Brown, Cecil H. 1984. <i>Language and Living Things</i>. Rutgers: Rutgers University Press.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: "Times New Roman"; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 15px;"><br></p>
<p style="margin: 0px 0px 0px 18px; text-align: justify; text-indent: -18px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: "Times New Roman"; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Ladefoged, Peter & Ian Maddieson. 1996. <i>The sounds of the world’s languages.</i> Oxford: Basil Blackwell.</p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: "Times New Roman"; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; min-height: 15px;"><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: "Times New Roman"; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;">Youn, Hyejin, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft and Tanmoy Bhattacharya. 2016. On the universal structure of human lexical semantics. <i>Proceedings of the National Academy of Sciences </i>113(7).1766-71.</p><div><br><blockquote type="cite"><div>On Nov 24, 2025, at 8:04 AM, Juergen Bohnemeyer via Lingtyp <lingtyp@listserv.linguistlist.org> wrote:</div><br class="Apple-interchange-newline"><div>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Sorry, just to clarify further: by “generalizations over languages”, I didn’t mean typological generalizations; I meant descriptive statements about individual languages. Those are generalizations in the sense that they abstract away from individual speakers
and attribute certain properties to entire linguistic varieties or speech communities. That’s the nature of secondary data in my view. — Best — Juergen</div>
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div id="ms-outlook-mobile-signature"><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="ae8627d5-c000-496e-8629-2b9a306f1640" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="dedd23ee-3870-4a8d-aa25-fea8da739134" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>
<br>
</span><span style="">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>
</div>
<div id="mail-editor-reference-message-container">
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"></div>
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">
<b>From: </b>Lingtyp <lingtyp-bounces@listserv.linguistlist.org> on behalf of Juergen Bohnemeyer via Lingtyp <lingtyp@listserv.linguistlist.org><br>
<b>Date: </b>Monday, November 24, 2025 at 09:47<br>
<b>To: </b>Mira Ariel <mariel@tauex.tau.ac.il>, Martin Haspelmath <martin_haspelmath@eva.mpg.de>, Peter Arkadiev <peterarkadiev@yandex.ru>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Dear all — I’m treating as primary data anything that consists of the speech, or judgments (although those to me have a less “vivid” quality as data), of individual speakers (and analogously for sign language). As opposed to generalizations over languages —
that’s what I mean by secondary data. I’m well aware that corpus data has an in-between status. Perhaps rather than to say that
<i>is </i>primary data, it would be more appropriate to say that it can be used, to some extent, like primary data.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Primary data can be the result of spontaneous observation, can consist of recordings of what Himmelmann (1998) calls ‘staged’ discourses, and can be elicited or collected experimentally. I see experimentation and elicitation as cluster concepts that form a
multidimensional continuum (as discussed in my upcoming book on <i>Semantic research: From data to analysis</i>, due out with CUP in January). </div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Today, almost all of morphosyntactic typology and the bulk of phonetic typology is based on secondary data. In contrast, semantic typology (my primary focus) mostly utilizes primary data, out of sheer necessity, since secondary data is not available. </div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
And to respond to Martin: I really didn’t mean to suggest that we drop secondary data typology right this minute ;-) (I’m actually myself up to my ears in Grambank data these days.) What I’m envisioning is a gradual shift in emphasis over the next couple of
decades, especially when it comes to megaprojects (by typological standards) such as Grambank. Creating the resources needed to get us into striking distance for primary data typology on grammar will require a vast effort, so at some point, typologists and
funders will have to make decisions on which basket they want to place those big eggs (sorry, mixing metaphors again) in, continuing to pour everything into the secondary data basket or gradually shifting emphasis toward funding more primary data projects.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Best — Juergen</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Himmelmann, N. P. (1998). Documentary and descriptive linguistics. <i>Linguistics</i> 36:161-195.</div>
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
<br>
</div>
<div id="ms-outlook-mobile-signature"><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="fbed1943-a80a-495d-a92a-abbb84b969bd" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="410598d8-79df-46a9-a640-8dff244af1ae" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>
<br>
</span><span style="">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>
</div>
<div id="mail-editor-reference-message-container">
<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="margin-right: 0in; margin-left: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
</div>
<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; margin-right: 0in; margin-left: 0in; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">
<b>From: </b>Mira Ariel <mariel@tauex.tau.ac.il><br>
<b>Date: </b>Monday, November 24, 2025 at 09:14<br>
<b>To: </b>Martin Haspelmath <martin_haspelmath@eva.mpg.de>, Juergen Bohnemeyer <jb77@buffalo.edu>, Peter Arkadiev <peterarkadiev@yandex.ru>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject: </b>RE: [Lingtyp] Reporting cross-linguistic frequencies<br>
<br>
</div><div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);">Hi,</span></div><p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p><div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);">I’m not a typologist, but in semantics/pragmatics research a similar dilemma arises: Corpus data or experimental data? My experience has been that although both have flaws, both can
advance our understanding of language. We should just give up on the idea that we could find the one perfect methodology. That said, there’s plenty of room to criticize what one thinks is a flawed methodology, of course.</span></div><p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p><div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);">Best,</span></div><div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);">Mira</span></div><p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: "Times New Roman", serif; color: rgb(10, 47, 65);"> </span></p>
<div style="margin-right: 0in; margin-left: 0in; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(225, 225, 225) currentcolor currentcolor;"><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="font-family: Calibri, sans-serif; font-size: 11pt;"><b>From:</b> Lingtyp <lingtyp-bounces@listserv.linguistlist.org>
<b>On Behalf Of </b>Martin Haspelmath via Lingtyp<br>
<b>Sent:</b> Sunday, November 23, 2025 11:56 PM<br>
<b>To:</b> Juergen Bohnemeyer <jb77@buffalo.edu>; Peter Arkadiev <peterarkadiev@yandex.ru>; Linguistic Typology <lingtyp@listserv.linguistlist.org><br>
<b>Subject:</b> Re: [Lingtyp] Reporting cross-linguistic frequencies</span></div>
</div><p class="MsoNormal" style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
</p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">I agree with Peter that the corpus-based methods employed by Hawkins, Wälchli, Cysouw, Levshina and others have been very important,
and also with Jürgen that "when confronting the causal inference problem in typology, we must consider every source of evidence that we can get our hands on."</span></p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">But I don't agree with Peter that "the whole enterprise [of overcoming genealogical and areal biases] does not appear to be very
productive", and I don't agree with Jürgen that we "must eventually move from secondary data typology to primary data typology".</span></p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">I think that the enterprise of controlling for family and contact effects is absolutely necessary, because otherwise we cannot
distinguish outcomes of universal/non-historical factors from outcomes of historical events. Peter recognizes this implicitly when he says that we should "combine experimental research ... with a quantitative study of variation in corpora across a small number
of sufficiently distinct languages". That's precisely the point: Which languages are "sufficiently distinct"? And hasn't the search for empirical universals been *highly productive* over the last few decades? The recent paper by Verkerk et al. (2025) has found
good evidence for most of the empirical universals that had been seriously discussed earlier, so the Greenbergian universals seem to very robust findings compared to many other prestigious claims in linguistics.</span></p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">And I think that there is no reason to abandon secondary-data typology just because we can also (increasingly) do primary-data
typology. Typological comparison can be done at multiple scales and multiple levels of granularity, and it is not clear that we can dispense with any of these levels. For example, we want to do typology of phonological segments (along the lines of the Phoible.org
database), or typology of word meanings (lexification typology, cf. </span><span style="font-family: Aptos, sans-serif; font-size: 12pt; color: blue;"><u><a href="https://clics.clld.org/" originalsrc="https://clics.clld.org/" data-outlook-id="c1d3d11d-7030-4493-b067-a2fb603bd8e4" style="color: blue; margin-top: 0px; margin-bottom: 0px;">https://clics.clld.org/</a></u></span><span style="font-family: Aptos, sans-serif; font-size: 12pt;">),
and for these, it seems that secondary data will not be easily replaced.</span></p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">Best,</span></p><p class="WordSection1" style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;">Martin</span></p><div style="margin-right: 0in; margin-left: 0in;"><span style="font-family: Aptos, sans-serif; font-size: 12pt;"> </span><br class="webkit-block-placeholder"></div><div style="margin: 0in 0px; font-family: Aptos, sans-serif; font-size: 12pt;">
On 21.11.25 16:04, Juergen Bohnemeyer wrote:</div>
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;"><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="">Dear Peter — I’m a massive fan of corpus-based typology. More broadly, there is no question in my mind that we should, and must, eventually move from secondary data typology to primary data typology. Nobody seems to deny that secondary
data typology is fraught with too many problematic idealizations: in particular, it reduces entire languages to single observations, and it suffers from incomparable decisions on what is treated as a language in different parts of the world. </span></div><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="">(The second problem is closely related to, but not entirely identical with, the countability problem Ian Joo mentions. The fact that
<i>language</i> is a count noun is a powerful illustration of how ordinary language can frame reality in ways that may impede scientific progress if it goes unchecked, as Whorf pointed out. However, actually counting languages is not the issue for regression-based
modeling, since regression models don’t operate on counts. But the question whether what is treated as an observation (i.e., a language) is uniform across the sample is of course very much a concern for the validity of sampling-based and regression-based modeling
alike.)</span></div><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="">There is a broader answer to your question, though: as a matter of course, when confronting the causal inference problem in typology (i.e., when hunting for the causal forces that shape languages), we must consider every source of
evidence that we can get our hands on. Aside from corpus-based typology, this includes field-based psycholinguistics and the toolkit of evolutionary linguistics, including simulations and miniature artificial language experiments. </span></div><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="">Let me also suggest a distinction between methods that are primarily geared toward the discovery of typological distributions and the examination of their statistical properties and methods than can be used to test hypotheses of
causal inference (i.e., explanatory hypotheses). Experimental research such as what I just mentioned has its uses primarily for testing explanatory hypotheses. Corpus-based research can have both functions. But if we want to use corpora to discover typological
distributions, we’ll need very large parallax corpus databases. As are being developed now. </span></div><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><div style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style="">Best — Juergen</span></div><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p><p class="MsoNormal" style="margin: 0in; font-family: Aptos, sans-serif; font-size: 12pt;">
<span style=""> </span></p>
<div id="ms-outlook-mobile-signature"><div style="margin: 0in;"><span style="font-family: Helvetica, sans-serif; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo <br>
<br>
Office: 642 Baldy Hall, UB North Campus<br>
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>
Phone: (716) 645 0127 <br>
Fax: (716) 645 3825<br>
Email: </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="7643e63c-8bfc-41aa-8852-7b0b541c2585" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"><br>
Web: </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="85338293-d5de-4c84-86fe-b66be2c1ed5e" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"> <br>
<br>
</span><span style="font-family: Calibri, sans-serif; font-size: 11pt;">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica, sans-serif; font-size: 9pt;"><br>
<br>
There’s A Crack In Everything - That’s How The Light Gets In <br>
(Leonard Cohen) </span></div><div style="margin: 0in;"><span style="font-family: Calibri, sans-serif; font-size: 11pt;">-- </span></div></div></blockquote></div></div></div></div></blockquote></div><br></body></html>