29.1649, FYI: April 2018 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Apr 17 19:31:03 UTC 2018


LINGUIST List: Vol-29-1649. Tue Apr 17 2018. ISSN: 1069 - 4875.

Subject: 29.1649, FYI: April 2018 Newsletter - LDC

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Tue, 17 Apr 2018 15:30:29
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: April 2018 Newsletter - LDC

 
In this newsletter: 

LDC at ICASSP 2018
LDC at the Philadelphia Science Carnival

New Publications:

Concretely Annotated New York Times
H2, E2, ERK1 Children's Writing
TRAD Arabic-French Parallel Text -- Newsgroup

LDC at ICASSP 2018

LDC will be exhibiting at ICASSP 2018, held this year April 15-20 in Calgary,
Canada. Stop by booth B2 to learn more about recent developments at the
Consortium and new publications.

Also, be on the lookout for the following presentations featuring LDC work:

Enhancement and Analysis of Conversational Speech: JSALT 2017
Tuesday, April 17, 16:00 - 18:00
Session: Speech Analysis 

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification

A Novel LSTM-based Speech Preprocessor for Speaker Diarization in Realistic
Mismatch Conditions
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification

LDC will post conference updates via our Twitter feed and Facebook page. We
hope to see you there!   

LDC at the Philadelphia Science Carnival

LDC will share the fun of language with the community  on Saturday, April 28,
with a booth at the Philadelphia Science Carnival. Visitors will enjoy three
language-oriented educational activities that include a language
identification game and Chinese character recognition.

The Philadelphia Science Carnival is an annual event organized by
Philadelphia’s Franklin Institute to acquaint children and adults with the
joys of science.

New publications:

(1) Concretely Annotated New York Times was developed by Johns Hopkins
University's Human Language Technology Center of Excellence. It adds multiple
kinds and instances of automatically-generated syntactic, semantic, and
coreference annotations to The New York Times Annotated Corpus (LDC2008T19).
Concrete is a schema for representing structured, hierarchical, and
overlapping linguistic annotations. This release provides multiple tool
outputs producing the same annotation types as different annotation theories
under a shared tokenization. Concretely Annotated New York Times contains all
of the 1.8 million articles in The New York Times Annotated Corpus.

Concretely Annotated New York Times is distributed via hard drive.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Any organization that licensed The New York Times Annotated Corpus
(LDC2008T19) may request a copy of Concretely Annotated New York Times
(LDC2018T12) for a $250 media fee.  Non-members may license this data for a
fee.

(2) H2, E2, ERK1 Children's Writing was developed by the Cooperative State
University Baden-Württemberg, University of Education. It consists of
approximately 2,000 texts written over four months by 173 German school
children age six through eleven years. The data in this corpus was collected
by elementary schools in Baden Württemberg, Germany, and digitized at the
Cooperative State University during the 2016/2017 school year. Three second,
third, and fourth grade classrooms participated in the collection. Texts were
written within regular class settings. The students were presented with a
picture and were asked to write a story to describe the picture or, if unable
to write a text, to list what they saw in the picture. 

There were 173 total participants. 100 students were multilingual, and further
metadata is available for 166 of the 173 children. The following is included
for each text in the database: school week of collection; school type; age;
gender; grade/classroom; language spoken at home; and school materials used.

LDC has also released H1 Children's Writing (LDC2016T01).
H2, E2, ERK1 Children's Writing is distributed via web download.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

(3) TRAD Arabic-French Parallel Text -- Newsgroup was developed by ELDA as
part of the PEA-TRAD project. It contains French translations of a subset of
approximately 10,000 Arabic words from GALE Phase 1 Arabic Newsgroup Parallel
Text - Part 1 (LDC2009T03). The PEA-TRAD project (Translation as a Support for
Document Analysis) was supported by the French Ministry of Defense (DGA). Its
purpose was to develop speech-to-speech translation technology for multiple
languages (e.g., Arabic, Chinese, Pashto) from a variety of domains. This
release consists of 398 segments (translations units) from 17 documents. The
source data is Arabic newsgroup text collected and translated into English by
LDC for the DARPA GALE (Global Autonomous Language Exploitation) program.
LDC has also released TRAD Chinese-French Parallel Text -- Blog (LDC2018T02).

TRAD Arabic-French Parallel Text -- Newsgroup is distributed via web download.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-1649	
----------------------------------------------------------






More information about the LINGUIST mailing list