35.2030, FYI: July 2024 Newsletter - LDC

Tue Jul 16 14:05:10 UTC 2024

LINGUIST List: Vol-35-2030. Tue Jul 16 2024. ISSN: 1069 - 4875.

Subject: 35.2030, FYI: July 2024 Newsletter - LDC

Moderator: Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>

LINGUIST List is hosted by Indiana University College of Arts and Sciences.
================================================================

Date: 15-Jul-2024
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: July 2024 Newsletter - LDC

In this newsletter:
LDC at IC2S2
Fall 2024 LDC Data Scholarship Program

New publications:
MATERIAL Bulgarian-English Language Pack
Dialogs Re-Enacted Across Languages
________________________________________
LDC at IC2S2
LDC is delighted to be a bronze sponsor for the 10th International
Conference on Computational Social Science (IC2S2) held this year on
Penn’s campus July 17-20. The conference will feature research from
around the world across a broad range of relevant fields to advance
the many frontiers of computational social science. Be sure to visit
LDC’s table during the poster sessions July 18 and 19 from 1:30-2:30
pm.

Fall 2024 LDC Data Scholarship Program
Student applications for the Fall 2024 LDC Data Scholarship program
are being accepted now through September 15, 2024. This program
provides eligible students with no-cost access to LDC data. Students
must complete an application consisting of a data use proposal and
letter of support from their advisor. For application requirements and
program rules, visit the LDC Data Scholarships page.
________________________________________

New publications:
MATERIAL Bulgarian-English Language Pack was developed by Appen for
the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL
(Machine Translation for English Retrieval of Information in Any
Language) program. It contains 80 hours of Bulgarian conversational
telephone speech, transcripts, English translations, annotations, and
queries.

Calls were made using different telephones (e.g., mobile, landline)
from a variety of environments. Transcripts cover approximately 40% of
the speech files, and approximately 10% of the speech files were
translated into English. This release also includes domain
annotations, English queries, and their relevance annotations.

The MATERIAL program focused on underserved languages with the
ultimate goal to build cross language information retrieval systems to
find speech and text content using English search queries.

2024 members can access this corpus through their LDC accounts
provided they have submitted a completed copy of the special license
agreement. Non-members may license this data for a fee.

*

Dialogs Re-Enacted Across Languages was developed at the University of
Texas at El Paso. It contains 17 hours of conversational speech in
English and Spanish by 129 unique bilingual speakers, specifically,
short fragments extracted from spontaneous conversations and close
re-enactments in the other language by the original speakers, for 3816
pairs of matching utterances. Data was collected in 2022-2023.
Participants were recruited from among students at the University of
Texas at El Paso; all were bilingual speakers of General American
English and of Mexico-Texas Border Spanish.

Each speaker pair had a 10 minute conversation in one language.
Various fragments from these conversations were chosen for
re-enactment, and the original speakers produced equivalents in the
other language. Each re-enactment was vetted for fidelity to the
original and naturalness in the target language. Also included is
metadata about conversations, participants, re-enactments and
utterances.

2024 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-2030
----------------------------------------------------------