36.793, Confs: The 2025 BabyLM Workshop (China)

Wed Mar 5 06:05:05 UTC 2025

LINGUIST List: Vol-36-793. Wed Mar 05 2025. ISSN: 1069 - 4875.

Subject: 36.793, Confs: The 2025 BabyLM Workshop (China)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Erin Steitz <ensteitz at linguistlist.org>

================================================================

Date: 05-Mar-2025
From: Aaron Mueller [amueller at bu.edu]
Subject: The 2025 BabyLM Workshop

The 2025 BabyLM Workshop

Date: 05-Nov-2025 - 09-Nov-2025
Location: Suzhou, China
Meeting URL: https://babylm.github.io

Linguistic Field(s): Cognitive Science; Computational Linguistics;
Language Acquisition; Psycholinguistics

BabyLM aims to bring together multiple disciplines to answer an
enduring question: how can a computational system learn language from
limited inputs? Cognitive scientists investigate this question by
trying to understand how humans learn their native language during
childhood. Computer scientists tackle this question by attempting to
build efficient machine-learning systems to accomplish this task.
BabyLM brings these two communities together, asking how insights from
cognitive science can be used to assemble more sample-efficient
language models and how language modeling architectures can inspire
research in cognitive science.
Previously, BabyLM has been organized as a competition, challenging
participants to train a language model on a human-sized amount of
data, up to 100 million words. This year, we expand the scope of
BabyLM by presenting it as a workshop. While we will still run the
competition, we also invite original research papers at the
intersection of cognitive science and language modeling without entry
into any competition track (see suggested topics below).
*Competition Tracks*
We are keeping the strict, strict-small, and multimodal tracks from
previous years. These allow participants to train on 10M words, 100M
words, or 100M words and unlimited visual data, respectively.
This year, we introduce the **interaction** track. This track
facilitates the exploration of feedback and interaction with LLM
agents during pretraining. This track allows pretrained language
models to serve as teacher models, generating textual supervision for
the student models to use as training signals; however, student models
are still required to be trained on 100M words or fewer.
*Workshop Topics*
The BabyLM workshop encourages interdisciplinary submissions at the
interface of language modeling, cognitive science, language
acquisition, and/or evaluation. To this end, we will accept papers on
a variety of topics, including but not limited to the following:
* Data-efficient architectures and training methods
* Data curation for efficient training
* Cognitively and linguistically inspired language modeling and
evaluation
* Scaling laws; large and small model comparisons
* Cognitively inspired multimodal modeling or evaluation
*Submission and Key Dates*
We will accept submissions through ACL Rolling Review (ARR) or
directly through OpenReview. Paper submissions to the workshop can
ignore competition entry deadlines. Exact dates will be determined
based on official EMNLP guidelines as they become available.
* Early February: Call for papers released
* End of February: Training data released
* End of April: Evaluation pipeline released
* May 19: ARR submission deadline
* Mid-late July: Direct submission deadline
* Mid-August: Direct submission reviews due; ARR commitment deadline
* Early September: Decisions released
* Mid-September: Camera-ready due
* 5-9 November: Workshop at EMNLP in Suzhou
Submissions will be made through OpenReview. Submissions can be full
archival papers (or non- archival upon request) and can be up to eight
pages in length. Formatting requirements will follow standards for
EMNLP 2025 workshops. This includes length and anonymity requirements
upon submission. Reviewing will be double-blind. As before, we will
allow dual submission; however, we do not allow dual publication.
Papers submitted to the workshop will be evaluated on merit and
relevance. For competition participants, acceptance is lenient; we
plan only to reject competition submissions that make incorrect or
unjustified claims, that have significant technical issues, that do
not reveal enough methodological details for replication, or that
demonstrate only minimal time investment. Feedback will largely be
directed toward improving submissions.
See the BabyLM website for more details: https://babylm.github.io
*Contact*
For questions and discussions related to the competition or workshop,
please join the BabyLM slack channel. A link is provided on the BabyLM
website.

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Elsevier Ltd http://www.elsevier.com/linguistics

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Multilingual Matters http://www.multilingual-matters.com/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-36-793
----------------------------------------------------------