36.3838, FYI: December 2025 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Mon Dec 15 17:05:02 UTC 2025


LINGUIST List: Vol-36-3838. Mon Dec 15 2025. ISSN: 1069 - 4875.

Subject: 36.3838, FYI: December 2025 Newsletter - LDC

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Daniel Swanson <daniel at linguistlist.org>

================================================================


Date: 15-Dec-2025
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: December 2025 Newsletter - LDC


In this newsletter:
LDC 2026 membership discounts now available
LDC’s 1000th corpus
Approaching deadline for Spring 2026 data scholarship applications
LDC closed for Winter Break December 25 – January 2
New publications:
2021 NIST Speaker Recognition Evaluation Development and Test Set
LORELEI Sinhala Incident Language Pack
________________________________________
LDC 2026 membership discounts now available
Now through March 2, 2026, any organization that joins the Consortium
or renews their membership will receive a 10% discount off the 2026
membership fee. Membership remains the most economical way to access
current and past LDC releases. Consult Join LDC for details on
membership options and benefits.
LDC’s 1,000th corpus
LDC is delighted to announce the release of the 1,000th corpus into
the Catalog! This milestone represents the commitment we made over
thirty years ago to provide large quantities of diverse data, robust
research program support, and exceptional member services. We are
grateful for the continued support and collaboration of our members,
friends, and the community.
Approaching deadline for Spring 2026 data scholarship applications
Attention students: don’t miss out on the chance to receive no-cost
access to LDC data for your research. Applications for Spring 2026
data scholarships are due January 15, 2026. For more information on
requirements and program rules, see LDC Data Scholarships.
LDC closed for Winter Break December 25-January 2
LDC will be closed from Thursday, December 25, 2025, through Friday,
January 2, 2026, in accordance with the University of Pennsylvania
Winter Break Policy. Our offices will reopen on Monday, January 5,
2026. Requests received by the Membership Office during Winter Break
will be processed when the office reopens.
________________________________________
New publications:
2021 NIST Speaker Recognition Evaluation Test Set was developed by LDC
and NIST (National Institute of Standards and Technology). It contains
approximately 447 hours of Cantonese, Mandarin, and English
conversational telephone speech, audio from video, and selfie image
data for development and test, along with answer keys, enrollment,
trial files, and documentation from the NIST-sponsored 2021 Speaker
Recognition Evaluation (SRE).
The SRE task is speaker detection, that is, to determine whether a
specified target speaker was speaking during a segment of speech.
SRE21 focused on telephone speech and audio from video and included
close-up images of participants. The evaluation also featured
cross-lingual trials, that is, enrollment and test segments spoken in
different languages.
The data was drawn from the WeCanTalk corpus collected by LDC in which
speakers called friends or relatives who agreed to record their
telephone conversations lasting between 8-10 minutes. Subjects
contributed multiple conversational telephone speech recordings and
audio recordings in which they were talking, plus a single selfie
image. Recordings were manually audited to verify speaker, language,
and quality.
2025 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
*
LORELEI Sinhala Incident Language Pack was developed by LDC and is
comprised of 8.1 million words of Sinhala monolingual text, 700,00
words of English monolingual text, 6.4 million words of parallel
Sinhala- English text, and 50,000 words annotated for entity discovery
and linking and situation frames. It constitutes all of the text data,
annotations, supplemental resources, and related software tools for
the Sinhala language used in the DARPA LORELEI / LoReHLT 2018
Evaluation.
The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. In the evaluation
scenario, an unforeseen event triggered a need for humanitarian and
logistical support in a region where the incident language had
received little or no attention in NLP research. Evaluation
participants provided NLP solutions, including information extraction
and machine translation, with limited resources and limited
development time.
Data was collected from news, social network, weblog, newsgroup,
discussion forum, and reference material. Entity discovery and linking
annotation identified entities to be detected by systems for scoring
purposes. Situation frame analysis was designed to extract basic
information about needs and relevant issues for planning a disaster
response effort.
2025 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com


----------------------------------------------------------
LINGUIST List: Vol-36-3838
----------------------------------------------------------



More information about the LINGUIST mailing list