35.2312, FYI: August 2024 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Thu Aug 22 18:05:06 UTC 2024


LINGUIST List: Vol-35-2312. Thu Aug 22 2024. ISSN: 1069 - 4875.

Subject: 35.2312, FYI: August 2024 Newsletter - LDC

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Joel Jenkins <joel at linguistlist.org>

================================================================


Date: 15-Aug-2024
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: August 2024 Newsletter - LDC


In this newsletter:
Fall 2024 LDC Data Scholarship program

New publications:
LORELEI Uyghur Incident Language Pack
Ravnursson Faroese Speech and Transcripts
________________________________________
Fall 2024 LDC Data Scholarship program
Student applications for the Fall 2024 LDC Data Scholarship program
are being accepted now through September 15, 2024. This program
provides eligible students with no-cost access to LDC data. Students
must complete an application consisting of a data use proposal and
letter of support from their advisor. For application requirements and
program rules, visit the LDC Data Scholarships page.
________________________________________

New publications:
LORELEI Uyghur Incident Language Pack was developed by LDC and is
comprised of 28 million words of Uyghur monolingual text, 500,000
words of English monolingual text, 3.3 million words of parallel and
comparable Uyghur-English text, and 200,000 words annotated for simple
named entities and situation frames. It constitutes all of the text
data, annotations, supplemental resources, and related software tools
for the Uyghur language that were used in the DARPA LORELEI / LoReHLT
2016 Evaluation.

The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. In the evaluation
scenario, an unforeseen event triggered a need for humanitarian and
logistical support in a region where the incident language had
received little or no attention in NLP research. Evaluation
participants provided NLP solutions, including information extraction
and machine translation, with limited resources and limited
development time.

Data was collected from news, social network, weblog, newsgroup,
discussion forum, and reference material. Named entity annotation
identified entities to be detected by systems for scoring purposes.
Situation frame analysis was designed to extract basic information
about needs and relevant issues for planning a disaster response
effort.

2024 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

*

Ravnursson Faroese Speech and Transcripts contains 109 hours of
Faroese prompted speech from 433 speakers (249 female, 184 male),
corresponding transcripts and speaker metadata. It is an extract from
the Basic Language Resource Kit 1.0 (BLARK 1.0) developed by the Faroe
Islands' Ravnur Project.

Speech data was collected in 2022. Speakers from all major dialect
areas in the Faroe Islands in three age groups -- 15-35, 36-60, and
61+ years -- read texts that included a word list, a phrase list,
closed vocabulary readings, and short texts. Recordings also contain
spontaneous speech. Orthographic transcripts are included.

2024 members can access this corpus through their LDC accounts
provided they have submitted a completed copy of the special license
agreement. Non-members may license this data at no cost.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-35-2312
----------------------------------------------------------



More information about the LINGUIST mailing list