36.2742, Software: KhasiBERT: Foundational Language Model for Khasi

The LINGUIST List linguist at listserv.linguistlist.org
Mon Sep 15 15:05:02 UTC 2025


LINGUIST List: Vol-36-2742. Mon Sep 15 2025. ISSN: 1069 - 4875.

Subject: 36.2742, Software: KhasiBERT: Foundational Language Model for Khasi

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Daniel Swanson <daniel at linguistlist.org>

================================================================


Date: 12-Sep-2025
From: B Nyalang [jasonnyal at gmail.com]
Subject: KhasiBERT: Foundational Language Model for Khasi


KhasiBERT is the first open-source AI language model trained
exclusively on Khasi-language corpora. Developed by MWire Labs, it
supports civic NLP tasks such as translation, summarization, and
search, and is designed for linguistic preservation and inclusive
digital access.
Khasi belongs to the Khasic branch of the Austroasiatic language
family and is spoken by over 1.4 million people in Northeast India.
Despite its active use, Khasi remains underrepresented in digital
infrastructure and linguistic research.
KhasiBERT is publicly available at:
https://mwirelabs.com/models/khasibert
Artifacts include:
- Model weights and training logs
- Corpus preparation methodology
This resource is intended for researchers, educators, and civic
technologists working on low-resource NLP, linguistic preservation,
and inclusive AI.

Linguistic Field(s): Computational Linguistics
                     Language Documentation
                     Text/Corpus Linguistics

Subject Language(s): Khasi (kha)




------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com


----------------------------------------------------------
LINGUIST List: Vol-36-2742
----------------------------------------------------------



More information about the LINGUIST mailing list