34.951, FYI: March 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Sat Mar 18 04:05:04 UTC 2023


LINGUIST List: Vol-34-951. Sat Mar 18 2023. ISSN: 1069 - 4875.

Subject: 34.951, FYI: March 2023 Newsletter - LDC

Moderator: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Sarah Robinson, Joshua Sims, Jeremy Coburn, Daniel Swanson, Matthew Fort, Maria Lucero Guillen Puon, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: 
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: March 2023 Newsletter - LDC


In this newsletter:
LDC’s 30th anniversary year ends
LDC data and commercial technology development

New publications:
Mixer 3 Speech
LORELEI Tamil Representative Language Pack
________________________________________

LDC’s 30th anniversary year ends
We hope you enjoyed the monthly data spotlights in celebration of
LDC’s 30th anniversary year, April 2022-March 2023. We would not have
achieved this milestone without the continued support and
collaboration of our members, friends, and the community. We are
grateful. As we enter our fourth decade, we pledge to continue to
serve the community and our members by distributing high quality,
diverse data and by providing top-notch member services and research
program support.

LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial product
or for any commercial purpose. LDC data users should consult
corpus-specific license agreements for limitations on the use of
certain corpora. Visit the Licensing page for further information.
________________________________________

New publications:
Mixer 3 Speech contains 3,200 hours of conversational telephone speech
involving 3,875 speakers, 19,595 telephone recordings, and 26 distinct
languages. This material was collected by LDC from 2005-2007 as part
of the Mixer project, and recordings in this corpus were used in NIST
Speaker Recognition Evaluation and NIST Language Recognition
Evaluation corpora, including 2006 SRE and 2007 LRE.

Recordings were generated using LDC's computer telephony system.
Recruited speakers were connected through a robot operator to carry on
casual conversations lasting up to 10 minutes. Subjects fluent in
languages other than English were asked to complete at least one
non-English call. Metadata includes the number of calls per subject
and language, as well as speaker demographic information.

2023 members can access this corpus through their LDC accounts. This
corpus is a Members-Only release and is not available for non-member
licensing. Contact ldc at ldc.upenn.edu for information about membership.
*
LORELEI Tamil Representative Language Pack is comprised of over 41
million words of Tamil monolingual text, 680,000 words of found
Tamil-English parallel text, and 226,000 Tamil words translated from
English data. Approximately 78,000 words were annotated for named
entities and over 24,000 words were annotated for entity discovery and
linking, and situation frames (identifying entities, needs, and
issues). Data was collected from discussion forum, news, reference,
social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.

The knowledge base for entity linking annotation is available
separately as LORELEI Entity Detection and Linking Knowledge Base
(LDC2020T10).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------


LINGUIST List is supported by the following publishers:

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Dictionary Society of North America http://dictionarysociety.com/

Edinburgh University Press www.edinburghuniversitypress.com

European Language Resources Association (ELRA) http://www.elra.info

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

Springer Nature http://www.springer.com

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-34-951
----------------------------------------------------------


More information about the LINGUIST mailing list