36.2742, Software: KhasiBERT: Foundational Language Model for Khasi
The LINGUIST List
linguist at listserv.linguistlist.org
Mon Sep 15 15:05:02 UTC 2025
LINGUIST List: Vol-36-2742. Mon Sep 15 2025. ISSN: 1069 - 4875.
Subject: 36.2742, Software: KhasiBERT: Foundational Language Model for Khasi
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Daniel Swanson <daniel at linguistlist.org>
================================================================
Date: 12-Sep-2025
From: B Nyalang [jasonnyal at gmail.com]
Subject: KhasiBERT: Foundational Language Model for Khasi
KhasiBERT is the first open-source AI language model trained
exclusively on Khasi-language corpora. Developed by MWire Labs, it
supports civic NLP tasks such as translation, summarization, and
search, and is designed for linguistic preservation and inclusive
digital access.
Khasi belongs to the Khasic branch of the Austroasiatic language
family and is spoken by over 1.4 million people in Northeast India.
Despite its active use, Khasi remains underrepresented in digital
infrastructure and linguistic research.
KhasiBERT is publicly available at:
https://mwirelabs.com/models/khasibert
Artifacts include:
- Model weights and training logs
- Corpus preparation methodology
This resource is intended for researchers, educators, and civic
technologists working on low-resource NLP, linguistic preservation,
and inclusive AI.
Linguistic Field(s): Computational Linguistics
Language Documentation
Text/Corpus Linguistics
Subject Language(s): Khasi (kha)
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en
Edinburgh University Press http://www.edinburghuniversitypress.com
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
MIT Press http://mitpress.mit.edu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Peter Lang AG http://www.peterlang.com
----------------------------------------------------------
LINGUIST List: Vol-36-2742
----------------------------------------------------------
More information about the LINGUIST
mailing list