31.2063, FYI: June 2020 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Wed Jun 24 01:49:43 UTC 2020


LINGUIST List: Vol-31-2063. Tue Jun 23 2020. ISSN: 1069 - 4875.

Subject: 31.2063, FYI: June 2020 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Tue, 23 Jun 2020 21:47:33
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: June 2020 Newsletter - LDC

 
In this newsletter: 
LDC Releases LORELEI Language Packs for COVID-19 Research

New Publications:
CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition
SemTransCNC
TAC KBP English Event Nugget Detection and Coreference - Comprehensive
Training and Evaluation Data 2014-2015
__

LDC Releases LORELEI Language Packs for COVID-19 Research
The COVID-19 pandemic has highlighted the importance of data-driven solutions
to facilitate rapid response and humanitarian relief, and its global nature
demonstrates the need for multi-language resources. To aid in this effort, LDC
is releasing data it developed in the DARPA LORELEI program under a special
no-cost license for COVID-19 research. These resources are available in a
single corpus:

LDC2020E21 LORELEI Language Packs for COVID-19 Research

This data set includes representative language packs and incident language
packs for over two dozen low resource languages, comprising data, annotations,
basic natural language processing tools, lexicons, and grammatical resources.

For further information about this corpus and licensing terms, see COVID-19
Research. 
__

New publications:

(1) CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition was developed by
LDC and consists of approximately 27 hours of unscripted telephone
conversations between native speakers of the Taiwan dialect of Mandarin
Chinese. This second edition updates the audio files to wav format, simplifies
the directory structure, and adds documentation and metadata. The first
edition is available as CALLFRIEND Mandarin Chinese-Taiwan Dialect (LDC96S56).

CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition is distributed via
web download. 

2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

*

(2) SemTransCNC, developed by The Hong Kong Polytechnic University, is a
semantic transparency dataset of Chinese nominal compounds built using a
series of crowd-based experiments. It contains overall semantic transparency
(OST) and constituent semantic transparency (CST) data for 1,176 dimorphemic
Chinese nominal compounds, which consist of free morphemes and have mid-range
frequencies.

SemTransCNC is distributed via web download.

2020 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2020
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(3) TAC KBP English Event Nugget Detection and Coreference - Comprehensive
Training and Evaluation Data 2014-2015 was developed by LDC and contains
training and evaluation data produced in support of the TAC KBP English Event
Nugget Detection and Coreference tasks in 2014 and 2015. 

This release includes source documents, gold standard event nugget annotations
in multiple formats, coreference information for the nuggets, and tokenized
source documents. Source data consists of English newswire and discussion
forum text collected by LDC.

TAC KBP English Event Nugget Detection and Coreference - Comprehensive
Training and Evaluation Data 2014-2015 is distributed via web download.

2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-2063	
----------------------------------------------------------






More information about the LINGUIST mailing list