32.1361, FYI: March 2021 Newsletter - LDC

Mon Apr 19 16:28:35 UTC 2021

LINGUIST List: Vol-32-1361. Mon Apr 19 2021. ISSN: 1069 - 4875.

Subject: 32.1361, FYI: March 2021 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Mon, 19 Apr 2021 12:28:19
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: March 2021 Newsletter - LDC

In this newsletter: 
New Publications:
X-SRL: Parallel Cross-lingual Semantic Role Labeling
TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation
Data 2013-2014
________________________________________

New publications:
(1) X-SRL: Parallel Cross-lingual Semantic Role Labeling was developed by
Heidelberg University, Department of Computational Linguistics and the Leibniz
Institute for the German Language (IDS). It consists of approximately three
million words of German, French, and Spanish annotated for semantic role
labeling. The texts are translations of the English portion of 2009 CoNLL
Shared Task Part 2 (LDC2012T04). All sentences have annotations for verbal
predicates and share the original English Propbank label set across the four
languages.

The 2009 CoNLL Shared Task developed syntactic dependency annotations,
including the semantic dependency model roles of both verbal and nominal
predicates. The following English data was used in the shared task:

- Treebank-2 (LDC95T7): over one million words of annotated English newswire
and other text developed by the University of Pennsylvania
- Proposition Bank I (LDC2004T14): semantic annotation of newswire text from
Treebank-2 developed by the University of Pennsylvania
- NomBank v 1.0 (LDC2008T23): argument structure for instances of common nouns
in Treebank-2 and Treebank-3 (LDC99T42), developed by New York University

For X-SRL, the English source data was automatically translated using DeepL.
Automatic tokenization, lemmatization, part-of-speech tagging, and syntactic
parsing were then applied to the text. The data was divided into train,
development, and test partitions. Semantic labels were transferred for the
train and development sections, and the test sentences were validated for
translation quality, alignment, label transfer, and filtering. 

X-SRL: Parallel Cross-lingual Semantic Role Labeling is distributed via web
download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(2) TAC KBP English Sentiment Slot Filling – Comprehensive Training and
Evaluation Data 2013-2014 was developed by LDC and contains training and
evaluation data produced in support of the 2013 and 2014 TAC KBP Sentiment
Slot Filling tracks. The data in this release includes queries, manual runs
(human-produced query responses), and assessment results for human- and
system-produced query responses. Source data was English news and web text.

The regular English Slot Filling track involved mining information about
entities from text using a specified set of "slots", or attributes. The goal
of the Sentiment Slot Filling task was to evaluate the quality of detectors
for positive and negative sentiment.  

TAC KBP English Sentiment Slot filling – Comprehensive Training and Evaluation
Data 2013-2014 is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-32-1361	
----------------------------------------------------------