37.2024, Calls: 13th Web-as-Corpus Workshop @EMNLP2026 (Hungary)

The LINGUIST List linguist at listserv.linguistlist.org
Tue Jun 9 11:05:02 UTC 2026


LINGUIST List: Vol-37-2024. Tue Jun 09 2026. ISSN: 1069 - 4875.

Subject: 37.2024, Calls: 13th Web-as-Corpus Workshop @EMNLP2026 (Hungary)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Valeriia Vyshnevetska <valeriia at linguistlist.org>

================================================================


Date: 08-Jun-2026
From: Veronika Laippala [veronika.laippala at utu.fi]
Subject: 13th Web-as-Corpus Workshop @ EMNLP2026


Full Title: 13th Web-as-Corpus Workshop @EMNLP2026
Short Title: WaC-13

Date: 29-Oct-2026 - 29-Oct-2026
Location: Budapest, Hungary
Web Site: https://wacky-workshop.github.io/

Linguistic Field(s): Computational Linguistics

Call Deadline: 07-Aug-2026

Call for Papers:
The World Wide Web has evolved from a resource for building linguistic
corpora into the central data infrastructure powering modern natural
language processing and Large Language Models (LLMs). As web-scale
data increasingly shapes AI systems’ knowledge and capabilities,
understanding its quality, representativeness, and ethical
implications has become critical.
At the same time, the “more is better” paradigm is being challenged by
issues such as machine-generated content, data toxicity, limited
metadata, and the under-representation of many languages and domains.
These challenges call for a shift toward Data-Centric AI, focusing on
the curation, analysis, and responsible use of web-derived data.
The 13th Web-as-Corpus (WaC-13) workshop provides a multidisciplinary
forum for research addressing the full lifecycle of web data. We
invite submissions on methods, resources, and applications related to
web corpora, with special emphasis on multilingual data and
less-resourced languages.
Topics of interest include (but are not limited to):
 - Creation and evaluation of high-quality datasets for foundation
models (e.g., data collection, filtering, enrichment, language
identification)
 - Use of web data in empirical linguistic research
 - Analysis of web-scale corpora for quality, representativeness, and
societal insights
 - Ethical and legal aspects of collecting, sharing, and using web
data
By bringing together researchers from NLP, linguistics, and the social
sciences, WaC aims to advance best practices for one of the field’s
most influential data sources.
Important Dates:
Direct paper submission deadline: 7 August, 2026
Pre-reviewed ARR commitment deadline: 1 September, 2026
Notification of acceptance: 5 September, 2026
Camera-ready paper due: 20 September, 2026
Workshop date: 29 Oct, 2026
Submissions:
Submit your papers through
https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13 or through
ARR commitment
https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13_ARR_Commitment.
Workshop Organizers:
Nikola Ljubešić, Jožef Stefan Institute, Slovenia
Yves Scherrer, University of Oslo, Norway
Laurie Burchell, Common Crawl Foundation
Veronika Laippala, University of Turku, Finland
Pedro Ortiz Suarez, Common Crawl Foundation
Thom Vaughan, Common Crawl Foundation
Vuk Dinić, Jožef Stefan Institute, Slovenia



------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Australian Linguistics Society https://als.asn.au/Home

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MDPI Languages https://www.mdpi.com/journal/languages

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com

SIL International Publications http://www.sil.org/resources/publications


----------------------------------------------------------
LINGUIST List: Vol-37-2024
----------------------------------------------------------



More information about the LINGUIST mailing list