32.960, FYI: March 2021 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Mar 16 11:52:16 UTC 2021


LINGUIST List: Vol-32-960. Tue Mar 16 2021. ISSN: 1069 - 4875.

Subject: 32.960, FYI: March 2021 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Tue, 16 Mar 2021 07:50:04
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: March 2021 Newsletter - LDC

 
In this newsletter: 
LDC data and commercial technology development 

New Publications:
Columbia Games Corpus
Global TIMIT Mandarin Chinese
BOLT Chinese Co-reference – Discussion Forum, SMS/Chat, and Conversational
Telephone Speech
_____
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC databases.
Non-member organizations, including non-member for-profit organizations,
cannot use LDC data to develop or test products for commercialization, nor can
they use LDC data in any commercial product or for any commercial purpose. LDC
data users should consult corpus-specific license agreements for limitations
on the use of certain corpora. Visit the Licensing page for further
information.
_____

New publications:
(1) Columbia Games Corpus was developed by the Spoken Language Group, Columbia
University and the Department of Linguistics, Northwestern University. It
consists of approximately 10 hours of spontaneous English conversation from 13
subjects playing a series of computer games that required verbal communication
to achieve joint goals of identifying and moving images on the screen to reach
a combined number of points. This publication also includes corresponding
manually time-aligned orthographic transcripts and annotation marking
discourse and turn-taking.

2021 Subscription Members will automatically receive copies of this corpus
provided they have submitted a completed copy of the special license
agreement. 2021 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for a fee.

*

(2) Global TIMIT Mandarin Chinese was developed by LDC and Shanghai Jiao Tong
University and consists of five hours of read speech from Chinese Gigaword
Fifth Edition (LDC2011T13) with corresponding transcripts. Fifty speakers read
120 sentences; specifically, 20 sentences were read by all speakers, 40
sentences were read by 10 speakers, and 60 sentences were read by one speaker,
for a total of 3220 sentence types.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(3) BOLT Chinese Co-reference – Discussion Forum, SMS/Chat, and Conversational
Telephone Speech was developed by Raytheon BBN Technologies and consists of
co-reference annotation on Chinese informal text. Co-reference annotation aims
to fill in connections between specific mentions in the text that refer to the
same entities and events in the discourse context. BOLT co-reference
annotation was performed on BOLT treebank annotation (i.e., Chinese Treebank
9.0 (LDC2016T13)) and covers noun phrases (including proper nouns, nominals,
pronouns, and null arguments), possessives, proper noun pre-modifiers, and
verbs.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104 


 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-32-960	
----------------------------------------------------------






More information about the LINGUIST mailing list