34.1252, FYI: April 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Apr 18 11:05:09 UTC 2023


LINGUIST List: Vol-34-1252. Tue Apr 18 2023. ISSN: 1069 - 4875.

Subject: 34.1252, FYI: April 2023 Newsletter - LDC

Moderator: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Joshua Sims, Daniel Swanson, Matthew Fort, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: 17-Apr-2023
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: April 2023 Newsletter - LDC


In this newsletter:
In memoriam: Christopher Cieri 1963-2023

New publications:
Penn Korean Universal Dependency Treebank
DEFT English Light and Rich ERE Annotation
________________________________________

In memoriam: Christopher Cieri 1963-2023
With deep sadness, LDC announces the passing of Christopher Cieri, our
Executive Director. Chris led the Consortium for over 25 years,
guiding its evolution from a small data repository and research hub to
a prominent global data center.

An accomplished linguist, computer scientist, and a well-read
humanist, Chris embodied the best qualities for executing the wide
range of duties demanded by his leadership role. He was a valued
colleague and friend and will be sorely missed.

All are welcome to visit our remembrance page for Chris.

________________________________________
New publications:
Penn Korean Universal Dependency Treebank contains 5010 sentences and
132,041 tokens annotated in dependency format under the Universal
Dependencies framework. It is a conversion of Korean Treebank
Annotations Version 2.0 (LDC2006T09), which was produced in
constituency format.

The source text is newswire stories from LDC’s Korean Press Agency
collection contained in Korean Newswire (LDC2000T45). Sentences were
automatically converted for dependency annotation; the output was
manually checked. The corpus contains 112 files in CoNLL-U format, the
Universal Dependencies standard, with a mapping to their counterpart
in LDC2006T09.

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.
*
DEFT English Light and Rich ERE Annotation was developed by LDC and
consists of 1190 English discussion forum, newswire, and proxy
documents annotated for entities, relations, and events (ERE). Light
ERE annotation labels entity mentions for the target set of entity,
relation, and event types between and among those entities, including
coreference. Rich ERE annotation expands types and tagging in the
entities, relations, and events annotation tasks and replaces strict
event coreference with a more loosely defined event hopper annotation.

902 documents were annotated following Light ERE annotation
guidelines. 288 documents were labeled with Rich ERE annotation in a
second pass after being annotated for Light ERE. The source data
consists of English discussion forum web text collected by LDC for the
DARPA BOLT program and contained in BOLT English Discussion Forums
(LDC2017T11); newswire documents published in various data sets
released in the TAC KBP project (Text Analysis Conference Knowledge
Base Population); and proxy documents intended to mimic government
analysis reports of newswire content published in DEFT Narrative Text
(LDC2016T07).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.


Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------


LINGUIST List is supported by the following publishers:

American Dialect Society/Duke University Press http://dukeupress.edu

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Dictionary Society of North America http://dictionarysociety.com/

Edinburgh University Press www.edinburghuniversitypress.com

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Linguistic Association of Finland http://www.ling.helsinki.fi/sky/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

Springer Nature http://www.springer.com

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-34-1252
----------------------------------------------------------



More information about the LINGUIST mailing list