30.802, FYI: February 2019 Newsletter - LDC

Wed Feb 20 04:15:21 UTC 2019

LINGUIST List: Vol-30-802. Tue Feb 19 2019. ISSN: 1069 - 4875.

Subject: 30.802, FYI: February 2019 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Tue, 19 Feb 2019 23:14:28
From: Membership Office [ldc at ldc.upenn.edu]
Subject: February 2019 Newsletter - LDC

In this newsletter:

Only two weeks left to enjoy 2019 membership discounts

Spring 2019 LDC Data Scholarship recipients

LDC’s new language game

New publications:
DEFT Chinese Committed Belief Annotation
IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b
Multi-Language Conversational Telephone Speech 2011 -- Arabic Group
Multilingual ATIS

Only two weeks left to enjoy 2019 membership discounts
There is still time to save on 2019 membership fees. Through March 1, all
organizations receive a discount on the 2019 membership fee (up to 10%) when
they choose to join or renew. For more information on membership benefits,
visit Join LDC. 

Spring 2019 LDC Data Scholarship recipients
Congratulations to the recipients of LDC's Spring 2019 Data Scholarships:

Colin Annand: University of Cincinnati (USA); PhD. Psychology. Colin is
awarded a copy of Switchboard-1 Release 2 for his research involving the
relationship between speech patterns and conversation content.

Si Chen: Huazhong University of Science and Technology (China); B.S.
Communication Engineering. Si is awarded a copy of ACE 2005 Multilingual
Training Corpus for his work on event extraction.

Noor-e-Hira: Fatima Jinnah Women University (Pakistan); MSc. Computer
Sciences. Noor is awarded a copy of NIST 2008 Open Machine Translation
(OpenMT) Evaluation for her research in machine translation.

Matthew Roddy: Trinity College Dublin (Ireland); Ph.D. Electrical Engineering.
Matthew is awarded copies of 2000 HUB5 English Evaluation Speech and
Transcripts for his work in spoken dialogue systems.

Ammara Zafar: Fatima Jinnah Women University (Pakistan); MSc Computer
Sciences. Ammara awarded a copy of NIST 2009 Open Machine Translation (OpenMT)
Evaluation for her research in machine translation.

For information about the program, visit the Data Scholarship page.

LDC’s new language game
LDC’s new language game, NameThatLanguage, tests your skill at recognizing the
language spoken in short audio clips. The game includes thousands of clips to
prevent memorization and offers a real challenge that increases as you
progress. In addition to being fun, the game provides useful data on language
confusability and linguistic diversity. Game results will be shared freely for
research. New clips and more languages continue to be added providing ongoing
challenges and new research data. Help support language research by playing!
https://namethatlanguage.org

New publications:

(1) DEFT Chinese Committed Belief Annotation was developed by LDC and consists
of approximately 83,000 tokens of Chinese discussion forum text annotated for
''committed belief,'' which marks the level of commitment displayed by the
author to the truth of the propositions expressed in the text.

DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to address
remaining capability gaps in state-of-the-art natural language processing
technologies related to inference, causal relationships, and anomaly
detection. LDC supported the DEFT program by collecting, creating, and
annotating a variety of data sources.

DEFT Chinese Committed Belief Annotation is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.  

*

(2) IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b was developed
by Appen for the IARPA (Intelligence Advanced Research Projects Activity)
Babel program. It contains approximately 210 hours of Lithuanian
conversational and scripted telephone speech collected in 2013 and 2014 along
with corresponding transcripts.

The Lithuanian speech in this release represents that spoken in the
Aukštaitian and Samogitian dialect regions of Lithuania. The gender
distribution among speakers is approximately equal; speakers' ages range from
16 years to 71 years. Calls were made using different telephones (e.g.,
mobile, landline) from a variety of environments including the street, a home
or office, a public place, and inside a vehicle.

IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b is distributed via
web download.

2019 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2019
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

*

(3) Multi-Language Conversational Telephone Speech 2011 -- Arabic Group was
developed by LDC and is comprised of approximately 117 hours of telephone
speech in distinct dialects of colloquial Arabic: Iraqi, Levantine and
Maghrebi.

The data were collected primarily to support research and technology
evaluation in automatic language identification, and portions of these
telephone calls were used in the NIST 2011 Language Recognition Evaluation
(LRE). LRE 2011 focused on language pair discrimination for 24
languages/dialects, some of which could be considered mutually intelligible or
closely related.

Multi-Language Conversational Telephone Speech 2011 -- Arabic Group is
distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

*

(4) Multilingual ATIS was developed by Google Inc. and consists of 5,871
utterances from ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3
Test Data (LDC95S26) annotated and translated into Hindi and Turkish. 

The ATIS (Air Travel Information Services) collection was developed to support
the research and development of speech understanding systems. Participants
were presented with various hypothetical travel planning scenarios and asked
to solve them by interacting with partially or completely automated ATIS
systems. The resulting utterances were recorded and transcribed. Data was
collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon
University, MIT Laboratory for Computer Science, National Institute for
Standards and Technology, and SRI International.

The original English utterances were manually translated into Hindi and
Turkish. This release also includes the original English utterance and the
machine translation back into English of the manual target language utterance
translation. Each utterance is annotated with named entities via table lookup;
markers include city, airline, airport names, and dates.

Multilingual ATIS is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data at no cost.  

Membership Office
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-802	
----------------------------------------------------------