35.2527, FYI: September 2024 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Sep 17 03:05:11 UTC 2024


LINGUIST List: Vol-35-2527. Tue Sep 17 2024. ISSN: 1069 - 4875.

Subject: 35.2527, FYI: September 2024 Newsletter - LDC

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Joel Jenkins <joel at linguistlist.org>

================================================================


Date: 16-Sep-2024
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: September 2024 Newsletter - LDC


In this newsletter:
LDC data and commercial technology development

New publications:
L2-KSU Native and Non-Native Arabic Speech
MATERIAL Somali-English Language Pack

________________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial
product, or for any commercial purpose. LDC data users should consult
corpus-specific license agreements for limitations on the use of
certain corpora. Visit the Licensing page for further information.
________________________________________

New publications:
L2-KSU Native and Non-Native Arabic Speech was developed by King Saud
University (KSU) and contains approximately six hours of Modern
Standard Arabic read speech from 80 subjects, along with transcripts
and speaker metadata.

The speech data was collected in 2022 from 40 native and 40 non-native
speakers. Native speakers were from Saudi Arabia, Egypt, and
Palestine, and provided audio recordings through the crowd sourcing
platform Khamsat. Non-native speakers were Central and West African
students enrolled in KSU's Arabic Linguistics Institute; they provided
speech recordings on site. All subjects read a series of ten
sentences, repeating each sentence multiple times.

2024 members can access this corpus through their LDC accounts
provided they have submitted a completed copy of the special license
agreement. Non-members may license this data for a fee.

*

MATERIAL Somali-English Language Pack was developed by Appen for the
IARPA (Intelligence Advanced Research Projects Activity) MATERIAL
(Machine Translation for English Retrieval of Information in Any
Language) program. It contains 80 hours of Somali conversational
telephone speech, transcripts, English translations, annotations, and
queries.

Calls were made using different telephones (e.g., mobile, landline)
from a variety of environments. Transcripts cover approximately 10% of
the speech files, and approximately 4% of the speech files were
translated into English. This release also includes domain
annotations, English queries, and their relevance annotations.

The MATERIAL program focused on underserved languages with the
ultimate goal to build cross language information retrieval systems to
find speech and text content using English search queries.

2024 members can access this corpus through their LDC accounts
provided they have submitted a completed copy of the special license
agreement. Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-35-2527
----------------------------------------------------------



More information about the LINGUIST mailing list