37.822, Software: KokborokBERT - Masked Language Model for Kokborok
The LINGUIST List
linguist at listserv.linguistlist.org
Fri Feb 27 23:05:02 UTC 2026
LINGUIST List: Vol-37-822. Fri Feb 27 2026. ISSN: 1069 - 4875.
Subject: 37.822, Software: KokborokBERT - Masked Language Model for Kokborok
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Daniel Swanson <daniel at linguistlist.org>
================================================================
Date: 27-Feb-2026
From: Badal Nyalang [nyalang at mwirelabs.com]
Subject: KokborokBERT - Masked Language Model for Kokborok
MWire Labs has released KokborokBERT, a masked language model (MLM)
for the Kokborok language of Northeast India.
The model was developed through domain-adaptive fine-tuning of
XLM-RoBERTa-base on a curated Kokborok corpus (372,850 tokens).
Training was conducted for 13 epochs on an NVIDIA A40 GPU.
Validation Results:
-Zero-shot XLM-R Perplexity: 396.69
-KokborokBERT Perplexity: 5.90
~67× reduction in modeling error
The model is designed to support downstream tasks such as named entity
recognition, text classification, and linguistic analysis. It may
serve as a foundation for future NLP research and resource development
for Kokborok.
Model repository and documentation:
https://huggingface.co/MWirelabs/kokborokbert
Researchers working on Kokborok and related languages are welcome to
use the model and provide feedback.
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Language Acquisition
Subject Language(s): Kok Borok (trp)
Language Family(ies): Sino-Tibetan
Tibeto-Burman
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en
Edinburgh University Press http://www.edinburghuniversitypress.com
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
MIT Press http://mitpress.mit.edu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Peter Lang AG http://www.peterlang.com
SIL International Publications http://www.sil.org/resources/publications
----------------------------------------------------------
LINGUIST List: Vol-37-822
----------------------------------------------------------
More information about the LINGUIST
mailing list