33.1363, FYI: April 2022 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Sat Apr 16 01:03:01 UTC 2022


LINGUIST List: Vol-33-1363. Fri Apr 15 2022. ISSN: 1069 - 4875.

Subject: 33.1363, FYI: April 2022 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Billy Dickson
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Everett Green, Sarah Goldfinch, Nils Hjortnaes,
      Joshua Sims, Billy Dickson, Amalia Robinson, Matthew Fort
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 15 Apr 2022 20:55:29
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: April 2022 Newsletter - LDC

 
In this newsletter: 
LDC Celebrates 30 Years
LDC Releases Ukrainian Data for Disaster and Refugee Relief Research

New publication:
LORELEI Wolof Representative Language Pack
______________________________________

LDC Celebrates 30 Years
April 2022 marks the beginning of LDC’s 30th year as the leader in language
resource development and distribution. Founded in 1992, the Consortium has
grown from a data repository to a vibrant data center that creates, shares,
and preserves language resources for research, education, and technology
development. The Catalog continues to grow, housing over 900 titles in more
than 90 languages. With the support of members, licensees, sponsors, and
collaborators, LDC has distributed over 200,000 copies of data to more than
6,000 organizations worldwide. We are sincerely grateful to the community, and
we pledge to continue the mission to provide diverse data, high-quality member
services, and research program support. 

Stay tuned for upcoming newsletter highlights from the last three decades! 

LDC Releases Ukrainian Data for Disaster and Refugee Relief Research
LDC is releasing Ukrainian data it developed in the DARPA AIDA program, the
NIST Language Recognition Evaluation series and the DARPA LORELEI program
under a special no-cost, limited license for disaster and refugee relief
research. 

These resources are available in three corpora:

LDC2022E06 AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts
LDC2020T24 LORELEI Ukrainian Representative Language Pack
LDC2020T10  LORELEI Entity Detection and Linking Knowledge Base

For further information about these data sets and licensing terms, see
Disaster and Refugee Relief Research.

______________________________________

New publication:
LORELEI Wolof Representative Language Pack was developed by LDC and is
comprised of approximately 225,000 words of Wolof monolingual text, 115,000
Wolof words translated from English data, 15,000 words annotated for named
entities, and 5,000-8,000 words annotated for entity discovery and linking and
situation frames. 

The LORELEI (Low Resource Languages for Emergent Incidents) program was
concerned with building human language technology for low resource languages
in the context of emergent situations. Representative languages were selected
to provide broad typological coverage.

Data was collected from news, social network, weblog, discussion forum, and
reference material. Entity detection and linking annotation identified
entities to be detected by systems for scoring purposes. Situation frame
analysis was designed to extract basic information about needs and relevant
issues for planning a disaster response effort.

The knowledge base for entity linking annotation is available separately as
LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10).

LORELEI Wolof Representative Language Pack is distributed via web download.  

2022 Subscription Members will automatically receive copies of this corpus.
2022 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-33-1363	
----------------------------------------------------------






More information about the LINGUIST mailing list