34.602, FYI: February 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Fri Feb 17 00:05:02 UTC 2023


LINGUIST List: Vol-34-602. Fri Feb 17 2023. ISSN: 1069 - 4875.

Subject: 34.602, FYI: February 2023 Newsletter - LDC

Moderator: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Sarah Robinson, Joshua Sims, Jeremy Coburn, Daniel Swanson, Matthew Fort, Maria Lucero Guillen Puon, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: 
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: February 2023 Newsletter - LDC


In this newsletter:
LDC membership discounts expire March 1
30th Anniversary Highlight: Arabic Treebank

New publications:
2019 NIST Speaker Recognition Evaluation Test Set – Audio-Visual
LORELEI Tagalog Representative Language Pack

________________________________________

LDC membership discounts expire March 1
Time is running out to save on 2023 membership fees. Renew your LDC
membership, rejoin the Consortium, or become a new member by March 1
to receive a discount of up to 10%. For more information on membership
benefits and options, visit Join LDC.

________________________________________

New publications:
2019 NIST Speaker Recognition Evaluation Test Set – Audio-Visual
contains approximately 64 hours of English audio-visual data for
development and test, answer keys, enrollment, trial files, and
documentation from the NIST-sponsored 2019 Speaker Recognition
Evaluation (SRE).

The 2019 evaluation task was speaker detection, that is, to determine
whether a specified target speaker was speaking during a segment of
speech. The evaluation was conducted in two parts: (1) a
leaderboard-style challenge based on conversational telephone speech
and (2) a separate evaluation using audio-visual data. This release
relates to the audio-visual evaluation.

The source audio-visual data was collected by LDC for the VAST (Video
Annotation for Speech Technology) project. That collection focused on
amateur video recordings from various online media hosting services.
The recordings vary in duration from 17.5 seconds to 13 minutes; most
have two audio channels (stereo), but some are monophonic (one
channel).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

*

LORELEI Tagalog Representative Language Pack was developed by LDC and
is comprised of approximately 4.8 million words of Tagalog monolingual
text, 341,000 words of found Tagalog-English parallel text, and
124,000 Tagalog words translated from English data. Approximately
78,000 words were annotated for named entities and over 26,000 words
were annotated for entity discovery and linking and situation frames
(identifying entities, needs and issues). Data was collected from
discussion forum, news, reference, social network, and weblogs.

The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.

The knowledge base for entity linking annotation is available
separately as LORELEI Entity Detection and Linking Knowledge Base
(LDC2020T10).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------


LINGUIST List is supported by the following publishers:

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cascadilla Press http://www.cascadilla.com/

Equinox Publishing Ltd http://www.equinoxpub.com/

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Springer Nature http://www.springer.com


----------------------------------------------------------
LINGUIST List: Vol-34-602
----------------------------------------------------------


More information about the LINGUIST mailing list