31.2598, FYI: August 2020 Newsletter - LDC

Tue Aug 18 18:57:09 UTC 2020

LINGUIST List: Vol-31-2598. Tue Aug 18 2020. ISSN: 1069 - 4875.

Subject: 31.2598, FYI: August 2020 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Tue, 18 Aug 2020 14:56:40
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: August 2020 Newsletter - LDC

In this newsletter: 
LDC adds DOI Identifier to its Language Resources
Fall 2020 LDC Data Scholarship Program

New Publications:
LORELEI Vietnamese Representative Language Pack
DEFT Chinese Light and Rich ERE Annotation
CALLFRIEND American English – Southern Dialect Second Edition

__

LDC adds DOI Identifier to its Language Resources
As of July 2020, LDC’s language resources include a Digital Object Identifier
(DOI), an internationally recognized identification standard for online
digital material. DOIs are alpha numeric strings that correspond to URLs and
metadata for specified resources. They are expressed as links that resolve to
the object’s online location. For example, the DOI for Penn Parsed Corpora of
Historical English LDC2020T16 is https://doi.org/10.35111/4hzx-5483, which
leads users to the LDC catalog entry for this data set. To facilitate its
assignment and administration of DOIs, LDC has joined DataCite, a global DOI
provider for research data. (DOIs for resources released before July 2020 will
be assigned through a process expected to be completed shortly.) LDC data sets
now have four persistent identifiers: a unique LDC number, ISBN, ISLRN, and
DOI. Adding DOIs is consistent with our aim to follow best practices for
archiving and curating digital resources, evidenced by the CoreTrustSeal
certification which recognizes the LDC Catalog as a trustworthy data
repository.

Fall 2020 LDC Data Scholarship Program
Student applications for the Fall 2020 LDC Data Scholarship program are being
accepted now through September 15, 2020. This scholarship program provides
eligible students with no-cost access to LDC data. Students must complete an
application consisting of a data use proposal and letter of support from their
advisor.

For application requirements and program rules, visit the LDC Data Scholarship
page.

__

New publications:
(1) LORELEI Vietnamese Representative Language Pack consists of Vietnamese
monolingual text, Vietnamese-English parallel text, annotations, supplemental
resources, and related software tools developed by LDC for the DARPA LORELEI
program.

Data was collected in the following genres: discussion forum, news, reference,
social network, and weblogs. Approximately 75,000 words were annotated for
named entities and up to 25,000 words contain additional annotation, including
situation frames (identifying entities, needs, and issues) and entity linking
and detection.

This corpus is distributed via web download. Non-members may license this data
for a fee.

 *

(2) DEFT Chinese Light and Rich ERE Annotation contains Chinese discussion
forum web text annotated for entities, relations, and events (ERE) using the
ERE Light and ERE Rich annotations schemas developed by LDC. Light ERE
annotation labels entity mentions for the target set of ERE types between and
among those entities, including coreference. Rich ERE annotation expands types
and tagging for ERE annotation tasks and replaces event coreference with event
hopper annotation. All files in this release (157) were annotated following
Light ERE guidelines; a subset (149) were also labeled with Rich ERE
annotation.  

This corpus is distributed via web download. Non-members may license this data
for a fee.

*

(3) CALLFRIEND American English – Southern Dialect Second Edition was
developed by LDC and consists of approximately 26 hours of unscripted
telephone conversations between native speakers of Southern dialects of
American English. This second edition updates the audio files to wav format,
simplifies the directory structure, and adds documentation and metadata. 

This corpus is distributed via web download. Non-members may license this data
for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-31-2598	
----------------------------------------------------------