32.2697, FYI: August 2021 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Thu Aug 19 11:39:47 UTC 2021


LINGUIST List: Vol-32-2697. Thu Aug 19 2021. ISSN: 1069 - 4875.

Subject: 32.2697, FYI: August 2021 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Thu, 19 Aug 2021 07:39:28
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: August 2021 Newsletter - LDC

 
In this newsletter: 
LDC at Interspeech 2021  
Fall 2021 LDC Data Scholarship Program 

New Publications:
Wikipedia Spanish Speech and Transcripts 
BOLT Egyptian Arabic SMS/Chat Parallel Training Data 
________________________________________
LDC at Interspeech 2021  
LDC will be exhibiting at Interspeech 2021 held this year, August 30 -
September 3, in a hybrid in-person, virtual format. Stop by our digital booth
for a look at a selection of documents and videos describing recent
developments at the Consortium and new publications. You can also contact us
through the conference platform to schedule a chat session. 

We’ll be hosting a live virtual video event highlighting LDC’s recent speech
publications during the conference. Stay tuned for scheduling information to
come!

LDC work will be featured in the following conference sessions: 

2011 Fearless Steps Challenge Phase-3 (FSC P3): Advancing SLT for Unseen
Channel and Mission Data Across NASA Apollo Audio
Tuesday, August 31, 20:00 
Session: In-person Oral: ASR Technologies and systems 19:00-21:00

Using Games to Augment Corpora for Language Recognition and Confusability
Wednesday, September 1, 16:20-16:40 
Session: In-person Oral: Speaker, Language, and Privacy 16:00-18:00

The Third DIHARD Diarization Challenge
Thursday, September 2, 16:00 
Session: Virtual: Speaker Diarization II 16:00-18:00

LDC will post conference links and updates via our Twitter feed and Facebook
page. We hope to “see” you at Interspeech 2021!

Fall 2021 LDC Data Scholarship Program 
Student applications for the Fall 2021 LDC Data Scholarship program are being
accepted now through September 15, 2021. This program provides eligible
students with no-cost access to LDC data. Students must complete an
application consisting of a data use proposal and letter of support from their
advisor.  

For application requirements and program rules, visit the LDC Data Scholarship
page. 
________________________________________
New publications:

(1)  Wikipedia Spanish Speech and Transcripts consists of approximately 25
hours of Spanish read speech from Wikipedia Grabada, the Spanish version of
WikiProject Spoken Wikipedia, and corresponding transcripts. Speakers (150
male, 43 female) read Wikipedia articles; the audio files were segmented and
transcribed by native Spanish speakers. Speaker metadata is included in this
release. 

Wikipedia Spanish Speech and Transcripts is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
*
(2) BOLT Egyptian Arabic SMS/Chat Parallel Training Data was developed by LDC
and consists of approximately 723,000 tokens of Egyptian Arabic SMS/Chat data
collected for the DARPA BOLT program along with their corresponding English
translations.

The source data was manually reviewed to exclude any messages/conversations
that were not in the target language or that had sensitive content, such as
personal identifying information.

Data was manually selected for translation. Messages/conversations were
arranged in chronological order, segmented into sentence units and assigned to
translation vendors. Translators followed LDC's BOLT translation guidelines.

BOLT Egyptian SMS/Chat Parallel Training Data is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104 

 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-32-2697	
----------------------------------------------------------






More information about the LINGUIST mailing list