31.334, FYI: January 2020 Newsletter - Linguistic Data Consortium

The LINGUIST List linguist at listserv.linguistlist.org
Thu Jan 23 18:22:07 UTC 2020


LINGUIST List: Vol-31-334. Thu Jan 23 2020. ISSN: 1069 - 4875.

Subject: 31.334, FYI:  January 2020 Newsletter - Linguistic Data Consortium

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================


Date: Thu, 23 Jan 2020 13:22:01
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: January 2020 Newsletter - Linguistic Data Consortium

 
In this newsletter: 

Renew Your LDC Membership Today
LREC Workshop for Citizen Linguistics – Call for Papers

New Publications:

Abstract Meaning Representation (AMR) Annotation Release 3.0 
Database of Word Level Statistics – Mandarin
LibriVox Spanish

Renew Your LDC Membership Today:
Join LDC for MY2020 while membership savings are still available. Now through
March 2, 2020, renewing MY2019 members receive a 10% discount off the 2020
membership fee. New or returning member organizations receive a 5% discount.
This year’s planned publications include Mixer 4 and 5 Speech (English
telephone speech and interviews), IARPA Babel Language Packs (telephone speech
and transcripts in underserved languages), and data from BOLT, DEFT, RATS, TAC
KBP and more. Membership remains the most economical way to access LDC
releases. Visit Join LDC for details on membership options and benefits. 

LREC Workshop on Citizen Linguistics:
LDC researchers and their colleagues are organizing a workshop on Citizen
Linguistics and Language Resource Development at LREC 2020 (Language Resource
and Evaluation Conference) to take place on May 16, 2020. The workshop
includes an open call for papers in language-related citizen science, a
tutorial on using the new LanguageARC.org citizen linguistics portal, and a
special session on best papers using LanguageARC.

New publications:

- Abstract Meaning Representation (AMR) Annotation Release 3.0 was developed
by LDC, SDL/Language Weaver, Inc., the University of Colorado's Computational
Language and Educational Research group, and the Information Sciences
Institute at the University of Southern California. It contains a sembank
(semantic treebank) of over 59,255 English natural language sentences from
broadcast conversations, newswire, weblogs, web discussion forums, fiction,
and web text. This release updates Abstract Meaning Representation 2.0
(LDC2017T10) with new data, more annotations on new and prior data, new or
improved PropBank-style frames, enhanced quality control, and multi-sentence
annotations.

AMR captures ''who is doing what to whom'' in a sentence. Each sentence is
paired with a graph that represents its whole-sentence meaning in a
tree-structure. AMR utilizes PropBank frames, non-core semantic roles,
within-sentence coreference, named entity annotation, modality, negation,
questions, quantities, and so on to represent the semantic structure of a
sentence largely independent of its syntax.

Abstract Meaning Representation (AMR) Annotation Release 3.0 is distributed
via web download. 

2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 


- Database of Word Level Statistics – Mandarin was developed by The Hong Kong
Polytechnic University. It provides lexical characteristics of a descriptive
and statistical nature for words and nonwords of Mandarin Chinese. It is
designed for researchers particularly concerned with language processing of
isolated words. Invariant characteristics include each item's lexicality,
sampa, pinyin, IPA transcription, lexical tone, syllable structure, syllable
length, pinyin length, segment length, dominant PoS, lexical frequency of the
dominant PoS, percent of that dominant PoS, and other PoSes associated with
the given item.

Database of Word Level Statistics – Mandarin is distributed via web download. 

2020 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2020
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

- LibriVox Spanish consists of approximately 73 hours of Spanish read speech
and transcripts. The audio data was taken from Spanish audiobooks developed by
LibriVox, a non-profit project that creates audiobooks from public domain
works. The transcripts were developed for this release.

The audio is comprised of sentences from 300 books read by 154 speakers (77
men and 77 women), representing native and non-native Spanish read speech.
Audio files were manually segmented and are between three and ten seconds in
length. Native Spanish speakers transcribed the audio data.

LibriVox Spanish is distributed via web download. 

2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-334	
----------------------------------------------------------






More information about the LINGUIST mailing list