30.3516, FYI: September 2019 Newsletter - LDC

Thu Sep 19 04:19:02 UTC 2019

LINGUIST List: Vol-30-3516. Thu Sep 19 2019. ISSN: 1069 - 4875.

Subject: 30.3516, FYI: September 2019 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Thu, 19 Sep 2019 00:17:50
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: September 2019 Newsletter - LDC

In this newsletter: 
LDC at Interspeech 2019

New Publications:
CALLFRIEND Canadian French Second Edition
BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training
Machine Reading Phase 1 NFL Scoring Training Data

__

LDC at Interspeech 2019
LDC is exhibiting at Interspeech 2019, September 15-19 in Graz, Austria. Stop
by Booth F16 to learn more about recent developments at the Consortium and new
publications.
Be on the lookout for The Second DIHARD Speech Diarization Challenge (DIHARD
II), a special session co-organized by LDC, and the following presentations
featuring LDC work:

The Second DIHARD Diarization Challenge: Dataset - task - and baselines
Neville Ryant, Christopher Cieri, Mark Liberman (LDC), Kenneth Church (Baidu,
USA), Alejandrina Cristia (Laboratoire de Sciences Cognitives et
Psycholinguistique), Jun Du (University of Science and Technology of China),
Sriram Ganapathy (Indian Institute of Science)
Oral Session, Tuesday September 17, 10:00 – 10:20, Hall 3

Automatic Detection of Prosodic Focus in American English
Sunghye Cho and Mark Liberman (LDC), Yong-cheol Lee (Cheongju University) 
Poster Session, Wednesday September 18, 16:00 – 18:00, Gallery B

Automatic detection of ASD in children using acoustic and text features from
brief natural conversations
Sunghye Cho, Mark Liberman, Neville Ryant (LDC), Meredith Cola, Robert T.
Schultz, Julia Parish-Morris (Children's Hospital of Philadelphia)
Oral Session, Wednesday September 18, 16:45 – 17:00, Hall 3

LDC will post conference updates via our Twitter feed and Facebook page. We
hope to see you there!   

--

New publications:
(1) CALLFRIEND Canadian French Second Edition was developed by LDC and
consists of approximately 26 hours of unscripted telephone conversations
between native speakers of Canadian French. This second edition updates the
audio files to wav format, simplifies the directory structure, and adds
documentation and metadata. The first edition is available as CALLFRIEND
Canadian French (LDC96S48).

All data was collected before July 1997. Participants could speak with a
person of their choice on any topic; most called family members and friends.
All calls originated in North America. The recorded conversations last up to
30 minutes.

CALLFRIEND Canadian French Second Edition is distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

*

(2) BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training was
developed by LDC for the DARPA BOLT (Broad Operational Language Translation)
program and consists of 388,027 words of Chinese and English parallel text
enhanced with linguistic tags to indicate word relations.  

This release consists of Chinese source text message and chat conversations
collected using two methods: new collection via LDC's collection platform, and
donation of SMS and chat archives from BOLT collection participants. The
source data is released as BOLT Chinese SMS/Chat (LDC2018T15).

The BOLT word alignment task was built on treebank annotation. LDC
automatically extracted Chinese source tokens, including empty
categories/traces, from word-segmented files provided by the BOLT Chinese
Treebank annotation team at Brandeis University. The word-segmented tokens
were then used to automatically generate ctb (Chinese Treebank) alignment, as
well as tokenized for character alignment by inserting white spaces to
separate characters.

BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training is
distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(3) Machine Reading Phase 1 NFL Scoring Training Data was developed by LDC for
use in the DARPA (Defense Advanced Research Projects Agency) Machine Reading
program. It contains 110 U.S. NFL (National Football League) scoring source
documents and 110 standoff annotation files, manually annotated for instances
of NFL Scoring annotation categories defined with respect to a NFL Scoring
ontology.

The Machine Reading program aimed to develop automated reading systems to
bridge the gap between knowledge contained in natural language texts and
knowledge accessible to formal reasoning systems. The reading systems designed
by program participants were required to extract and reason about facts from
text in multiple domains.

The data in this release constitutes the training data for the NFL Scoring Use
Cases evaluation, which tested the sports domain by extracting information
about scoring events and game outcomes and aligning that information with an
NFL Scoring ontology. 

Machine Reading Phase 1 NFL Scoring Training Data is distributed via web
download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

Membership Office
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-3516	
----------------------------------------------------------