30.4648, Disc: Review of 'Computational Modeling of Narrative'

Mon Dec 9 16:24:29 UTC 2019

LINGUIST List: Vol-30-4648. Mon Dec 09 2019. ISSN: 1069 - 4875.

Subject: 30.4648, Disc: Review of 'Computational Modeling of Narrative'

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Mon, 09 Dec 2019 11:23:33
From: Membership Office [ldc at ldc.upenn.edu]
Subject: Review of 'Computational Modeling of Narrative'

Read Review: http://linguistlist.org/issues/24/24-2936.html 

In this newsletter: 
LDC Membership Discounts for MY2020 Still Available
Spring 2020 Data Scholarship Program – deadline approaching
Introducing LanguageArc: A Citizen Linguist Portal

New Publications:
Magic Data Chinese Mandarin Conversational Speech
BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training
TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017
__

LDC Membership Discounts for MY2020 Still Available
Join LDC while membership savings are still available. Now through March 2,
2020, current MY2019 members who renew their LDC membership receive a 10%
discount off the membership fee. New or returning member organizations receive
a 5% discount through March 2. Membership remains the most economical way to
access LDC releases. Visit Join LDC for details on membership options and
benefits.

Spring 2020 Data Scholarship Program – deadline approaching
Students can apply for the Spring 2020 Data Scholarship Program now through
January 15, 2020. The LDC Data Scholarship program provides students with
no-cost access to LDC data. For more information on application requirements
and program rules, please visit LDC Data Scholarships. 

Introducing LanguageArc: A Citizen Linguist Portal
LanguageARC is a citizen science website for languages developed with a grant
from the National Science Foundation (no. 170377). Contributors to this online
community – “citizen linguists” – participate in a variety of tasks and
activities that support linguistic research, such as identifying accents from
audio clips, recording “tongue twisters,” and translating English sentences
into other languages. Data collected from LanguageArc will be made freely
available to the research community. New collection and annotation projects
will be added on an ongoing basis, and researchers will soon be able to create
their own LanugageArc projects with an easy-to-use Project Builder Toolkit. 
All are encouraged to explore the site and participate in the community.
Comments, questions and suggestions are welcome via the site’s Contact page. 
__

New publications:

(1) Magic Data Chinese Mandarin Conversational Speech was developed by Beijing
Magic Data Technology Co., Ltd. and consists of approximately 10 hours of
Mandarin conversational speech from 60 speakers. Each conversation was
recorded on multiple devices and is presented in multiple forms, resulting in
a total of approximately 60 hours of audio with corresponding transcripts. 

All participants were native speakers of Mandarin in Mainland China from
accent regions across the country. Speakers were paired for conversations on a
range of topics, including travel, fitness, games, sports, and pets. Metadata
such as topic, collection date, mobile device, and speaker demographic
information is available in the documentation accompanying this release.

Magic Data Chinese Mandarin Conversational Speech is distributed via web
download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee. 

*

(2) BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was
developed by LDC and consists of 349,414 words of Egyptian Arabic and English
parallel text enhanced with linguistic tags to indicate word relations.

This release contains Egyptian Arabic source text message and chat
conversations collected using two methods: new collection via LDC's collection
platform, and donation of SMS or chat archives from BOLT collection
participants. The source data is released as BOLT Egyptian Arabic SMS/Chat and
Transliteration (LDC2017T07).

The BOLT word alignment task was built on treebank annotation. Egyptian Arabic
source tree tokens were automatically extracted from tree files in LDC’s BOLT
Egyptian Arabic Treebank, which had been tagged for part-of-speech and
syntactically annotated. That data was then aligned and annotated for the word
alignment task.

BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training is
distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(3) TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data
2016-2017 was developed by LDC and contains training and evaluation data
produced in support of the TAC KBP Entity Discovery and Linking (EDL) tasks in
2016 and 2017. This corpus includes queries, knowledge base (KB) links,
equivalence class clusters for NIL entities, and entity type information for
each of the queries. The EDL reference KB, to which EDL data are linked, is
available separately in TAC KBP Entity Discovery and Linking - Comprehensive
Training and Evaluation Data 2014-2015 (LDC2019T02).

The goal of the EDL track is to conduct end-to-end entity extraction, linking
and clustering. For producing gold standard data, given a document collection,
annotators (1) extract (identify and classify) entity mentions (queries), link
them to nodes in a reference KB and (2) perform cross-document co-reference on
within-document entity clusters that cannot be linked to the KB.

Source data for the annotations consists of Chinese, English and Spanish
newswire and discussion forum text collected by LDC and is available in TAC
KBP Evaluation Source Corpora 2016-2017 (LDC2019T12).

TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017
is distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

Membership Office
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics
                     Ling & Literature

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-4648	
----------------------------------------------------------