34.2237, FYI: July 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Jul 18 12:05:06 UTC 2023


LINGUIST List: Vol-34-2237. Tue Jul 18 2023. ISSN: 1069 - 4875.

Subject: 34.2237, FYI: July 2023 Newsletter - LDC

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: 17-Jul-2023
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: July 2023 Newsletter - LDC


In this newsletter:
LanguageArc featured in Babel magazine
Fall 2023 LDC Data Scholarship Program

New publications:
Mixer 7 Spanish Speech
LORELEI Thai Representative Language Pack
________________________________________

LanguageArc featured in Babel magazine
The May 2023 edition of Babel (The Language Magazine) features an
article about LDC’s citizen science portal LanguageArc (Language
Analysis Research Community) and the diverse projects available there
that utilize a variety of novel incentives to supplement traditional
methods of creating data resources. Consider LanguageArc for your next
collection project. Note: a subscription is necessary to view the
article.

Fall 2023 LDC Data Scholarship Program
Student applications for the Fall 2023 LDC Data Scholarship program
are being accepted now through September 15, 2023. This program
provides eligible students with no-cost access to LDC data. Students
must complete an application consisting of a data use proposal and
letter of support from their advisor. For application requirements and
program rules, visit the LDC Data Scholarships page.
________________________________________

New publications:
Mixer 7 Spanish Speech was developed by LDC and contains 9,600 hours
of audio recordings of interviews, transcript readings, and
conversational telephone speech involving 191 distinct native Spanish
speakers. This material was collected by LDC in 2011-2012 as part of
the Mixer project, and the recordings were used in the 2012 NIST SRE
test set.

Recruited speakers were connected through a robot operator to carry on
casual conversations on a pre-set topic lasting up to 10 minutes.
Participants also visited LDC’s human subjects collection lab equipped
with a 14-microphone array where they participated in interviews and
transcript readings and conducted up to 3 telephone calls under
varying conditions. Selected speaker metadata was also collected.

2023 members can access this corpus through their LDC accounts. This
corpus is a members-only release and is not available for non-member
licensing. Contact ldc at ldc.upenn.edu for information about membership.
*
LORELEI Thai Representative Language Pack is comprised of over 39
million words of Thai monolingual text, 2.85 million words of found
Thai-English parallel text, and 141,000 Thai words translated from
English data. Over 186,000 words were annotated for named entities and
more than 25,000 words were annotated for entity discovery and linking
and situation frames (identifying entities, needs, and issues). Data
was collected from discussion forum, news, reference, social network,
and weblogs.

The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available
separately as LORELEI Entity Detection and Linking Knowledge Base
(LDC2020T10).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html


LINGUIST List is supported by the following publishers:

American Dialect Society/Duke University Press http://dukeupress.edu

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge Scholars Publishing http://www.cambridgescholars.com/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Dictionary Society of North America http://dictionarysociety.com/

Edinburgh University Press www.edinburghuniversitypress.com

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Linguistic Association of Finland http://www.ling.helsinki.fi/sky/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

SIL International Publications http://www.sil.org/resources/publications

Springer Nature http://www.springer.com

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-34-2237
----------------------------------------------------------



More information about the LINGUIST mailing list