35.3247, FYI: November 2024 Newsletter - LDC
The LINGUIST List
linguist at listserv.linguistlist.org
Sat Nov 16 01:05:07 UTC 2024
LINGUIST List: Vol-35-3247. Sat Nov 16 2024. ISSN: 1069 - 4875.
Subject: 35.3247, FYI: November 2024 Newsletter - LDC
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Joel Jenkins <joel at linguistlist.org>
================================================================
Date: 16-Nov-2024
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: November 2024 Newsletter - LDC
In this newsletter:
Join LDC for membership year 2025
Spring 2025 data scholarship application deadline
New publications:
LORELEI Yoruba Representative Language Pack
Samrómur Synthetic
________________________________________
Spring 2025 data scholarship application deadline
Applications are now being accepted through January 15, 2025, for the
Spring 2025 LDC data scholarship program which provides university
students with no-cost access to LDC data. Consult the LDC Data
Scholarships page for more information about program rules and
submission requirements.
________________________________________
New publications:
LORELEI Yoruba Representative Language Pack was developed by LDC and
is comprised of approximately 7.2 million words of Yoruba monolingual
text, 127,000 Yoruba words translated from English data, and 810,000
words of Yoruba-English parallel text. Approximately 77,000 words were
annotated for named entities, over 25,000 words were annotated for
full entity (including nominals and pronouns) and simple semantic
annotation, and around 10,000 words were annotated for noun phrase
chunking. Data was collected from discussion forum, news, reference,
social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotatannotation is available
separately as LORELEI Entity Detection and Linking Knowledge Base
(LDC2020T10).
2024 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
*
Samrómur Synthetic was developed by the Language and Voice Lab,
Reykjavik University and contains 72 hours of Icelandic synthetic
speech, transcripts and metadata. Source sentences were extracted from
the Samrómur platform, comprised of texts and transcripts covering
various genres. Text was processed through a text-to-speech system
developed by Reykjavik University's Language and Voice Lab to generate
speech files. Synthesized speech was created with 44 voices (22 male,
22 female) at four different speed rates for a total of 220 speakers
and 62,700 utterances (with 285 sentences/speaker).
2024 members can access this corpus through their LDC accounts
provided they have submitted a completed copy of the special license
agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Linguistic Field(s): Computational Linguistics
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Brill http://www.brill.com
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Edinburgh University Press https://edinburghuniversitypress.com
Elsevier Ltd http://www.elsevier.com/linguistics
Equinox Publishing Ltd http://www.equinoxpub.com/
European Language Resources Association (ELRA) http://www.elra.info
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Oxford University Press http://www.oup.com/us
Wiley http://www.wiley.com
----------------------------------------------------------
LINGUIST List: Vol-35-3247
----------------------------------------------------------
More information about the LINGUIST
mailing list