31.2285, FYI: July 2020 Newsletter - LDC

Thu Jul 16 01:54:57 UTC 2020

LINGUIST List: Vol-31-2285. Wed Jul 15 2020. ISSN: 1069 - 4875.

Subject: 31.2285, FYI: July 2020 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Wed, 15 Jul 2020 21:51:01
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: July 2020 Newsletter - LDC

In this newsletter:
Penn Parsed Corpora of Historical English Now Available From LDC
Fall 2020 LDC Data Scholarship Program

New Publications:
Speech Sentiment Annotations
Penn Parsed Corpora of Historical English
IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b
BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone
Speech Training

---

Penn Parsed Corpora of Historical English Now Available From LDC
LDC is pleased to announce that the Penn Parsed Corpora of Historical English
(LDC2020T16) – an important community resource for 20 years – is now available
for licensing in the LDC Catalog. Developed by University of Pennsylvania
researchers in the Linguistics Department under the direction of Professor
Anthony Kroch, this data set consists of syntactic annotation of English prose
texts from the earliest Middle English documents (1100 CE) up to the period of
the First World War (1914 CE).

Current licensees should contact LDC’s membership office with any questions
regarding access to this data set. 

Fall 2020 LDC Data Scholarship Program
Student applications for the Fall 2020 LDC Data Scholarship program are being
accepted now through September 15, 2020. This scholarship program provides
eligible students with no-cost access to LDC data. Students must complete an
application consisting of a data use proposal and letter of support from their
advisor.

For application requirements and program rules, please visit the LDC Data
Scholarship page.

---

New publications:
(1) Speech Sentiment Annotations was developed by Google Inc. and consists of
sentiment labels (positive, negative, neutral) for approximately 49,500
utterances covering 140 hours of audio from Switchboard-1 Release 2
(LDC97S62).

Speech Sentiment Annotations is distributed via web download. 

Non-members may license this data for a fee. 

*

(2) Penn Parsed Corpora of Historical English was developed at the University
of Pennsylvania and consists of running texts and text samples of British
English prose from the earliest Middle English documents (1100 CE) up to the
period of the First World War (1914 CE). This data set contains three corpora
covering traditionally recognized periods of English.

Penn Parsed Corpora of Historical English is distributed via web download. 

Non-members may license this data for a fee.

*

(3) IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b was developed by
Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel
program. It contains approximately 204 hours of Javanese conversational and
scripted telephone speech collected in 2014 and 2015 along with corresponding
transcripts.

IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b is distributed via
web download. 

Non-members may license this data for a fee.

*

(4) BOLT Chinese-English Word Alignment and Tagging -- Conversational
Telephone Speech Training was developed by LDC and consists of 158,651 words
of Chinese and English parallel text enhanced with linguistic tags to indicate
word relations.

BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone
Speech Training is distributed via web download. 

Non-members may license this data for a fee.

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-31-2285	
----------------------------------------------------------