36.934, FYI: March 2025 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Mar 18 00:05:08 UTC 2025


LINGUIST List: Vol-36-934. Tue Mar 18 2025. ISSN: 1069 - 4875.

Subject: 36.934, FYI: March 2025 Newsletter - LDC

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Joel Jenkins <joel at linguistlist.org>

================================================================


Date: 17-Mar-2025
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: March 2025 Newsletter - LDC


In this newsletter:
LDC data and commercial technology development
New publications:
2015 NIST Language Recognition Evaluation Test Set
The Xi’an Multi-Language Learner Corpus
________________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial product
or for any commercial purpose. LDC data users should consult
corpus-specific license agreements for limitations on the use of
certain corpora. Visit the Licensing page for further information.
________________________________________
New publications:
2015 NIST Language Recognition Evaluation Test Set was developed by
LDC and NIST. It contains the evaluation test set for the 2015 NIST
Language Recognition Evaluation (LRE), approximately 867 hours of
conversational telephone speech (CTS) and broadcast narrowband speech
(BNBS) collected by LDC in 20 languages over 6 clusters of related
languages: Arabic (Egyptian, Iraqi, Levantine, Maghrebi, Modern
Standard Arabic); Spanish (Caribbean, European, Latin American,
Brazilian Portuguese); English (British, Indian, General American
English); Chinese (Cantonese, Mandarin, Min Nan, Wu); Slavic (Polish,
Russian); and French (West African, Haitian Creole).
The CTS data includes calls between individuals in the same social
networks lasting 8-15 minutes and telephone speech from the IARPA
Babel series collected in 2012-2013 from speakers using a range of
phone types in diverse settings with varying noise conditions. The
BNBS data was collected by LDC from streaming and satellite radio
programming, focusing on programs that included narrowband speech
(e.g., call-ins to a talk show).
The goal of NIST's LRE evaluations is to establish the baseline of
current performance capability for CTS language recognition and to lay
the groundwork for further research efforts. LRE15 expanded the range
of test segment durations and added a test condition that allowed
systems to make use of unrestricted training data when developing
models
2025 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
*
The Xi’an Multi-Language Learner Corpus was developed by Xi'an
International Studies University (XISU) and is comprised of 526
argumentative essays in 15 languages by Chinese L1 university students
studying second languages, along with student metadata and writing
prompts. It was developed to support second language learner research
and to provide a database for cross-linguistic comparison of second
languages.
Data was collected in 2023 and 2024 from students at XISU and Yunnan
Minzu University (YMU) who were linguistic majors or studying one of
the foreign languages available at XISU and YMU. Off-topic essays and
incomplete texts were excluded.
2025 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Elsevier Ltd http://www.elsevier.com/linguistics

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-36-934
----------------------------------------------------------



More information about the LINGUIST mailing list