31.696, FYI: February 2020 Newsletter - LDC
The LINGUIST List
linguist at listserv.linguistlist.org
Tue Feb 18 18:54:53 UTC 2020
LINGUIST List: Vol-31-696. Tue Feb 18 2020. ISSN: 1069 - 4875.
Subject: 31.696, FYI: February 2020 Newsletter - LDC
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================
Date: Tue, 18 Feb 2020 13:54:48
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: February 2020 Newsletter - LDC
In this newsletter:
Only Two Weeks Left to Enjoy 2020 Membership Discounts
LREC Workshop on Citizen Linguistics - Deadline Extended
New Publications:
TAC KBP English Event Argument - Training and Evaluation Data 2014-2015
Chinese CogBank
Machine Reading Phase 1 IC Training Data
IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b
Only Two Weeks Left to Enjoy 2020 Membership Discounts
There is still time to save on 2020 Membership fees. Through March 2, all
organizations receive a discount on the 2020 Membership fee (up to 10%) when
they choose to join or renew. For more information on membership benefits,
visit Join LDC.
LREC Workshop on Citizen Linguistics - Deadline Extended
LDC Researchers and their colleagues are organizing a workshop on Citizen
Linguistics and Language Resource Development at LREC 2020 (Language Resource
and Evaluation Conference) to take place on May 16, 2020. The workshop
includes an open call for papers in language-related citizen science, a
tutorial on using the new LanguageARC.org citizen linguistics portal, and a
special session on best papers using LanguageARC. Call for Papers deadline
extended until February 24, 2020.
New publications:
- TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 was
developed by LDC and contains training and evaluation data produced in support
of the 2014 TAC KBP English Event Argument Extraction Pilot and Evaluation
tasks and the 2015 English Event Argument Extraction and Linking Training and
Evaluation tasks.
The Event Argument Extraction and Linking task required systems to extract
event arguments (entities or attributes playing a role in an event) from
unstructured text, indicate the role they play in an event, and link the
arguments appearing in the same event to each other. Since the extracted
information must be suitable as input to a knowledge base, systems constructed
tuples indicating the event type, the role played by the entity in the event,
and the most canonical mention of the entity from the source document. The
event types and roles were drawn from an externally-specified ontology of 31
event types, which included financial transactions, communication events, and
attacks.
This corpus includes source documents, manual runs, assessments, and event
hoppers, a form of identity coreference for events (2015 only). Source data
is English newswire and discussion forum text collected by LDC.
TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 is
distributed via web download.
2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
- Chinese CogBank is a database of cognitive properties of Chinese words
intended for use in metaphor understanding and generation. It consists of
232,497 ''word-property'' pairs, which are comprised of 83,104 words and
100,195 properties. Each ''word-property'' type also has an associated
frequency which can stand as a functional measure of the importance of a
property.
The data was collected via the Chinese search engine Baidu.com. The original
collection consisted of 1,258,430 types (5,637,500 tokens) of
''word-adjective'' pairs that were reduced in Chinese CogBank to 232,497
''word-property'' pairs after a series of manual checks.
Chinese CogBank is distributed via web download.
2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
- Machine Reading Phase 1 IC Training Data was developed by LDC for use in the
DARPA (Defense Advanced Research Projects Agency) Machine Reading program. It
contains 248 English source documents and 116 standoff annotation files,
annotated with instances of explicit relations and their arguments, as well as
some non-explicit relations.
The Machine Reading program aimed to develop automated reading systems to
bridge the gap between knowledge contained in natural language texts and
knowledge accessible to formal reasoning systems. The reading systems designed
by program participants were required to extract and reason about facts from
text in multiple domains.
The data in this release constitutes the training data for the IC (Core
Domain) task, which tested the core domain by extracting information about
Entities (people, organizations, geopolitical entities) and their involvement
in four types of Relations (Attack Relations, Biographical Relations,
Affiliation Relations and Family Relations), as described in newswire text.
This information was then aligned with an IC Use Cases ontology that would
allow automated reasoning about the extracted Entities and Relations.
Machine Reading Phase 1 IC Training Data is distributed via web download.
2020 Subscription Members will automatically receive copies of this corpus.
2020 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
- IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b was developed by
Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel
program. It contains approximately 204 hours of Dholuo conversational and
scripted telephone speech collected in 2014 and 2015 along with corresponding
transcripts.
The Dholuo speech in this release represents the South Nyanza and Trans-Yala
dialect regions of Kenya. The gender distribution among speakers is
approximately equal; speakers' ages range from 16 years to 65 years. Calls
were made using different telephones (e.g., mobile, landline) from a variety
of environments including the street, a home or office, a public place, and
inside a vehicle.
IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b is distributed via web
download.
2020 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2020
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.
Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Linguistic Field(s): Computational Linguistics
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://iufoundation.fundly.com/the-linguist-list-2019
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-31-696
----------------------------------------------------------
More information about the LINGUIST
mailing list