29.328, FYI: January 2018 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Thu Jan 18 23:12:42 UTC 2018


LINGUIST List: Vol-29-328. Thu Jan 18 2018. ISSN: 1069 - 4875.

Subject: 29.328, FYI: January 2018 Newsletter - LDC

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Thu, 18 Jan 2018 18:12:28
From: Membership Office [ldc at ldc.upenn.edu]
Subject: January 2018 Newsletter - LDC

 
In this newsletter: 
Membership Discounts for MY2018 Still Available

New Publications:

DEFT Spanish Treebank
DIRHA English WSJ Audio
TRAD Chinese-French Parallel Text – Blog

Membership Discounts for MY2018 Still Available
Join LDC while membership savings are still available. Now through March 1,
2018, renewing MY2017 members will receive a 10% discount off the membership
fee. New or non-consecutive member organizations will receive a 5% discount.
Membership remains the most economical way to access LDC releases. This year’s
planned publications include Multilanguage Conversational Telephone Speech,
IARPA Babel Language Packs (telephone speech and transcripts), DIRHA
(Distant-speech Interaction for Robust Home Applications), TRAD
(Chinese-French and Arabic-French parallel text), data from BOLT, DEFT,
LORELEI, RATS and TAC KBP, and more. Browse the Members pages for details on
membership options and benefits. 

New publications:

(1) DEFT Spanish Treebank was developed by LDC and the Language and
Computation Center (CLiC), University of Barcelona. It contains treebank
annotation of international Spanish newswire text and Latin American Spanish
discussion forum data created for the DARPA Deep Exploration and Filtering of
Text (DEFT) program. DEFT Spanish Treebank supported the program's goal of
deep natural language understanding.

Newswire source files were selected from Spanish Gigaword Third Edition
(LDC2011T12) and were manually sentence-segmented for DEFT. Discussion forum
source files were selected from Spanish discussion forum source data collected
by LDC, consisting of continuous multi-posts of 100-1000 words.

This release contains 114 files (54,394 tokens) of newswire data and 60 files
(55,307 tokens) of discussion forum data all of which were annotated with
constituents and syntactic functions. 

DEFT Spanish Treebank is distributed via web download.

2018 Subscription Members will receive copies of this corpus. 2018 Standard
Members may request a copy as part of their 16 free membership corpora.
Non-members may license this data for a fee.

(2) DIRHA English WSJ Audio was developed as part of the Distant-Speech
Interaction for Robust Home Applications (DIRHA) Project which addressed
natural spontaneous speech interaction with distant microphones in a domestic
environment. It is comprised of approximately 85 hours of real and simulated
read speech by six native American English speakers. The target utterances
were taken from CSR-I (WSJ0) Complete (LDC93S6A), specifically, the 5,000 word
subset of read speech from Wall Street Journal news text.

Speech was collected in a real apartment setting with typical domestic
background noise and inter/intra-room reverberation effects. Annotations,
speaker metadata and images of the apartment setting are also included. 

DIRHA English WSJ Audio is distributed via web download.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

(3) TRAD Chinese-French Parallel Text -- Blog was developed by ELDA as part of
the PEA-TRAD project. It contains French translations of a subset of
approximately 10,000 Chinese words from GALE Phase 1 Chinese Blog Parallel
Text (LDC2008T06).

The PEA-TRAD project (Translation as a Support for Document Analysis) was
supported by the French Ministry of Defense (DGA). Its purpose was to develop
speech-to-speech translation technology for multiple languages (e.g., Arabic,
Chinese, Pashto) from a variety of domains. 

The source data for TRAD Chinese-French Parallel Text is Chinese blog text
collected and translated into English by LDC for the DARPA GALE (Global
Autonomous Language Exploitation) program. Information about the ELDA
translation team, translation guidelines and validation results is contained
in the documentation accompanying this release.

TRAD Chinese-French Parallel Text -- Blog is distributed via web download.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

Membership Office
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-328	
----------------------------------------------------------






More information about the LINGUIST mailing list