37.822, Software: KokborokBERT - Masked Language Model for Kokborok

The LINGUIST List linguist at listserv.linguistlist.org
Fri Feb 27 23:05:02 UTC 2026


LINGUIST List: Vol-37-822. Fri Feb 27 2026. ISSN: 1069 - 4875.

Subject: 37.822, Software: KokborokBERT - Masked Language Model for Kokborok

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Daniel Swanson <daniel at linguistlist.org>

================================================================


Date: 27-Feb-2026
From: Badal Nyalang [nyalang at mwirelabs.com]
Subject: KokborokBERT - Masked Language Model for Kokborok


MWire Labs has released KokborokBERT, a masked language model (MLM)
for the Kokborok language of Northeast India.
The model was developed through domain-adaptive fine-tuning of
XLM-RoBERTa-base on a curated Kokborok corpus (372,850 tokens).
Training was conducted for 13 epochs on an NVIDIA A40 GPU.
Validation Results:
-Zero-shot XLM-R Perplexity: 396.69
-KokborokBERT Perplexity: 5.90
~67× reduction in modeling error
The model is designed to support downstream tasks such as named entity
recognition, text classification, and linguistic analysis. It may
serve as a foundation for future NLP research and resource development
for Kokborok.
Model repository and documentation:
https://huggingface.co/MWirelabs/kokborokbert
Researchers working on Kokborok and related languages are welcome to
use the model and provide feedback.

Linguistic Field(s): Applied Linguistics
                     Computational Linguistics
                     Language Acquisition

Subject Language(s): Kok Borok (trp)

Language Family(ies): Sino-Tibetan
                      Tibeto-Burman



------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com

SIL International Publications http://www.sil.org/resources/publications


----------------------------------------------------------
LINGUIST List: Vol-37-822
----------------------------------------------------------



More information about the LINGUIST mailing list