[Lingtyp] CfP – “TALKING DATA - Methodological and theoretical challenges raised by spoken interaction data” (Bologna, 9-10 Oct 2025)
Caterina Mauri
caterina.mauri at unibo.it
Mon Apr 28 16:40:49 UTC 2025
********** Apologies for cross-posting **********
Dear all,
we are pleased to announce the call for papers for the conference:
“TALKING DATA - Methodological and theoretical challenges raised by spoken interaction data”
When and where: University of Bologna, 9–10 October 2025 - Aula Prodi
Organizing Committee: Caterina Mauri, Eleonora Zucchini, Silvia Ballarè, Ludovica Pannitto
Scientific Committee: Members of the PRIN 2022 PNRR DiverSIta project (full list on the website<https://site.unibo.it/divers-ita/en/outreach-and-events/talking-data>)
The conference aims to bring together scholars working with spoken interaction data across different fields and approaches, with particular attention to the methodological and theoretical challenges that arise throughout the stages of data collection, transcription, annotation, and analysis.
The event will mark the closing of the DiverSIta project<https://site.unibo.it/divers-ita/en>, focused on the documentation of diversity in spoken Italian through the KIParla corpus (Mauri et al. 2019, www.kiparla.it<http://www.kiparla.it>). It also aims to foster dialogue and collaboration among researchers working on spoken interaction data in different languages and contexts.
Confirmed keynote speakers:
* Lorenza Mondada (University of Basel)
* Stefan Schnell (University of Zurich)
* Robbie Love (Aston University)
* Marlou Rasenberg (Radboud University)
We welcome contributions addressing (but not limited to) topics such as:
* Methodologies for spoken corpus design, data collection, transcription, and annotation
* Challenges of dealing with multilingual interactions, overlapping speech, co-constructions, disfluencies
* Data accessibility, privacy, and FAIR practices
* Typological, sociolinguistic, computational, psycholinguistic, and diachronic approaches to spoken interaction data
* Cross-disciplinary perspectives on the use of spoken corpora
For further details, please see the EXTENDED CALL pasted below and available at this link:
👉 https://site.unibo.it/divers-ita/en/outreach-and-events/talking-data
Submission information:
* Please send a one-page abstract (references excluded) in PDF format to:
➔ caterina.mauri at unibo.it<mailto:caterina.mauri at unibo.it> and eleonora.zucchini2 at unibo.it<mailto:eleonora.zucchini2 at unibo.it>
* Deadline for abstract submission: 20th May 2025
* Notification of acceptance: 31st May 2025
We warmly invite you to submit and join us in Bologna!
Best wishes,
Caterina Mauri (on behalf of the Organizing Committee)
-------------
CALL DESCRIPTION:
The conference aims to gather scholars working on data of spoken interaction from a variety of perspectives, with different approaches and goals, across different linguistics fields. We are especially interested in contributions addressing how this type of data raises both methodological and theoretical challenges all along the way, from collection, through transcription, to annotation and analysis.
The conference is the closing event of the project DiverSIta, Diversity in Spoken Italian<https://site.unibo.it/divers-ita/en>, which is dedicated mainly to the expansion of KIParla<https://kiparla.it/en/> (Mauri et al. 2019, www.kiparla.it<http://www.kiparla.it> ) a corpus aimed to document spoken Italian over time, in its internal diversity of speakers and communicative situations, with a focus on naturally occurring data (Ballarè, Mauri & Goria 2022). The conference will represent an opportunity to describe the corpus and the whole KIParla enterprise, learn about further resources, in different languages, sharing the focus on spoken interaction data; participants will have the chance to discuss the theoretical and methodological challenges that this type of data raises in various fields and approaches to the study of language, and find common or complementary objectives to pursue.
Notoriously, collecting, transcribing, and publishing data of spoken interaction pose more challenges than building resources portraying written or spoken but monological data, therefore for many years spoken corpora were limited to so-called WEIRD and LOL languages, i.e. languages with standardized written forms (Literate), official recognition (Official), and large speaker populations (Lots of users) (Dahl 2015). Only recently did we start to have access to resources containing spoken data for a variety of languages that includes less described ones, although a significant portion of such data consists of monological narratives (cf. MULTICast Haig & Schnell, 2015; SCOPIC, Barth & Evans 2021; Dingemanse & Lisenfeld 2022; DoReCo, Seifart, Paschen & Stave 2024).
Access to spoken data is crucial for various linguistic analytical perspectives that focus on language variation in a broad sense. Observing spoken interaction, despite its inherent messiness and unpredictability, is essential for developing comprehensive and accurate descriptions of language as it is truly used in real-life contexts. This approach helps mitigate biases toward overly polished or artificially structured data, allowing for a more authentic representation of linguistic diversity.
We welcome contributions discussing the issues, solutions, and challenges in building, annotating, using and comparing corpora of spoken interaction data, also in a cross-disciplinary perspective, highlighting the role of this specific type of data in shaping linguistic analyses, linguistic models, and methodological choices. A non-exhaustive list of topics includes:
Methodologies: corpus design, data collection, transcription, annotation and publication
* Sampling and balancing: reconciling the representativeness and spoken data
* Ecological and ethical practices for data collection
* Challenges and possible solutions for manual or (semi-)automated transcription
* Data formats and standards
* Data annotation: units of transcription, units of analysis, disfluencies, co-constructions, multilingual interactions, …
* Data FAIRness and accessibility: privacy protection and data sharing
* Main problems and solutions for multilingual corpora annotation
* Treebanks of spoken interactional data: how to deal with overlapping or utterance co-construction, ...
* LLM training based on conversational data and LLM interactional performance evaluation
* …
Analysis: Spoken interaction data in different approaches
* Language variation and spoken data: how interaction shapes internal variation
* Sociolinguistic perspectives on spoken data: to what extent can social categories explain variation in spoken language?
* Typological approaches to spoken interaction data, e.g. universal vs. language-specific phenomena, available resources
* Computational approaches to spoken interaction data, e.g. LLM training and fine-tuning, automatic detection of interactional phenomena
* Diachronic approaches to spoken interaction data, e.g. emergent constructions, studies highlighting the role of dialogical interaction in language change
* Studies on interactional data involving L2 speakers or speakers with multilingual repertoires: e.g. what can we learn about language acquisition and learners’ varieties from this type of data; how the presence of L2 speakers or speakers with complex repertoires shapes language in interaction.
* Psycholinguistic approaches, e.g. experimental settings involving spoken interactions
* …
Submission information
* Abstract submission: please send a one-page abstract (references excluded) in PDF to caterina.mauri at unibo.it<mailto:caterina.mauri at unibo.it> caterina.mauri at unibo.it<mailto:caterina.mauri at unibo.it> and eleonora.zucchini2 at unibo.it<mailto:eleonora.zucchini2 at unibo.it>
* Deadline for abstract submission: 20th May
* Notification of acceptance: 31st May
References
Barth, Danielle & Nicholas Evans (eds). 2017-2021. Social Cognition Parallax Interview Corpus (SCOPIC). “Language Documentation & Conservation Special Publication” 12. Honolulu, University of Hawai'i Press.
Dingemanse Mark & Andreas Liesenfeld. 2022. From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Association for Computational Linguistics, pp. 5614–5633.
Dobrovoljc, Kaja. 2022. Spoken Language Treebanks in Universal Dependencies: an Overview. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, European Language Resources Association, pp. 1798–1806.
Haig, Geoffrey & Stefan Schnell (eds.). 2015. Multi-CAST: Multilingual corpus of annotated spokentexts. (multicast.aspra.uni-bamberg.de/<http://multicast.aspra.uni-bamberg.de/>).
Mauri Caterina, Silvia, Ballare, Eugenio Goria, Massimo Cerruti & Francesco Suriano. 2019. KIParla corpus: A new resource for spoken Italian. In CEUR Workshop Proceedings, CEUR-WS 2481, pp. 1 – 7.
Mauri, Caterina, Silvia Ballarè, Eugenio Goria & Massimo Cerruti. 2022. Il corpus KIParla<https://cris.unibo.it/handle/11585/912372>. In Corpora e studi linguistici. Milano, Officinaventuno, pp. 109 – 118.
Seifart, Frank, Ludger Paschen & Matthew Stave (eds.). 2024. Language Documentation Reference Corpus (DoReCo) 2.0. Lyon, Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2).
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma & Chao Zhang. 2024. Connecting speech encoder and large language model for asr. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12637-12641.
-- -- -- -- -- -- -- --
Prof.ssa Caterina Mauri
Alma Mater Studiorum - Università di Bologna. Dipartimento di Lingue, Letterature e Culture moderne
Via Cartoleria 5, 40124, Bologna.
Homepage: https://www.unibo.it/sitoweb/caterina.mauri
Editor-in-chief of Linguistic Typology at the Crossroads <https://typologyatcrossroads.unibo.it/>
Ongoing projects: KIParla Corpus of spoken Italian<https://kiparla.it/en/> || DiverSIta: Diversity in spoken Italian<https://site.unibo.it/divers-ita/en>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20250428/6c1f07c0/attachment.htm>
More information about the Lingtyp
mailing list