[Lingtyp] spectrograms in linguistic description and for language comparison

Dingemanse, Mark Mark.Dingemanse at mpi.nl
Fri Dec 16 21:22:31 UTC 2022

At risk of bringing a sprawling and hugely interesting thread somewhat back on track, I agree with Adam Tallman's very first post that singular spectrograms (and pitch tracks, similarly) are sometimes treated with undue veneration in descriptive work. Reviewers insisting on them may be dictated more by what Gigerenzer has called 'statistical rituals' than by true evidentiary requirements.

I agree with others upthread that in this respect, interactional linguistics and conversation analysis provide useful models of how to deal with qualitative analysis of such data in ways that are more attentive to the situated production of speech, and perhaps also less epistemologically naïve. Work on phonetics and prosody by Local & Walker, Betty Couper-Kuhlen, Richard Ogden and others comes to mind.

While everybody seems to agree that singular examples are at best illustrative (to use Christian Lehman's term),  I haven't seen a lot of discussion of another aspect of Adam's original post, namely how to deal with larger amounts of examples.

Anyone who can generate a single pitch track can also generate multiple, and often this is useful, if only because visualizing one's data allows one to spot outliers (and hopefully make sense of them). For instance, I attach a figure from Andreas Liesenfeld's work on Cantonese (2019). This provides a useful impression of attested diversity by showing pitch tracks and relative vowel frequencies for 41 tokens. I'm using pitch tracks here because that was one of Adam's qualms, but of course sociolinguists have led the way here in the quantitative comparison of vowel qualities and other sociolinguistic variables.


Acoustic data in general is hard to compare because of its continuous and multidimensional nature. Linguists seem to naturally gravitate towards discrete representations like IPA (as in Mielke 2018). But there are some recent developments in bioacoustics that I think might also be interesting for linguists and typologists. This work makes it possible to do large-scale comparisons of spectrograms and acoustic data using methods of dimensionality reduction (e.g., Sainburg et al. 2020).

We've recently explored the use of these methods on conversational speech data for what we call 'bottom-up discovery' of structure and variation in response tokens (Liesenfeld & Dingemanse 2022). One useful consequence of this kind of approach is that we don't need to limit ourselves to what can be written down (orthographically, phonemically or phonetically) but can also work with representations that are closer to the original speech signals. This allows us to capture gradience and variation while at the same time visualizing larger scale patterns and distinctions.

Needless to say I don't think these kinds of methods (or any) can stand alone; we always benefit from methodological triangulation, and even singular illustrative examples will continue to have their place. But perhaps this shows at least the possibility and utility of using richer representations of acoustic material in comparative studies.

Refs cited:

Couper-Kuhlen, E., & Ford, C. E. (2004). Sound patterns in interaction: Cross-linguistic studies of phonetics and prosody for conversation. Amsterdam: John Benjamins.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. doi: 10.1016/j.socec.2004.09.033

Liesenfeld, A. (2019). Cantonese turn-initial minimal particles: Annotation of discourse-interactional functions in dialog corpora. Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, 471–479. https://liesenf.github.io/publication/paclic33/

Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. Proceedings of Interspeech 2022. doi: https://doi.org/10.21437/Interspeech.2022-11288

Local, J., & Walker, G. (2005). Methodological imperatives for investigating the phonetic organization and phonological structures of spontaneous speech. Phonetica, 62(2–4), 120–130. doi: 10.1159/000090093

Mielke, J. (2018). Visualizing phonetic segment frequencies with density-equalizing maps. Journal of the International Phonetic Association, 48(2), 129–154. doi: 10.1017/S0025100317000123

Ogden, R. (2012). Making Sense of Outliers. Phonetica, 69(1–2), 48–67. doi: 10.1159/000343197

Sainburg, T., Thielk, M., & Gentner, T. Q. (2020). Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLOS Computational Biology, 16(10), e1008228. doi: 10.1371/journal.pcbi.1008228

Best regards,



Mark Dingemanse, PhD

Associate Professor, Language & Communication, Radboud University

PI, Elementary Particles of Conversation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221216/fcf4bfea/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 343107 bytes
Desc: pastedImage.png
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221216/fcf4bfea/attachment-0001.png>

More information about the Lingtyp mailing list