35.1222, FYI: April 2024 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Tue Apr 16 15:05:14 UTC 2024


LINGUIST List: Vol-35-1222. Tue Apr 16 2024. ISSN: 1069 - 4875.

Subject: 35.1222, FYI: April 2024 Newsletter - LDC

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>

LINGUIST List is hosted by Indiana University College of Arts and Sciences.
================================================================


Date: 15-Apr-2024
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: April 2024 Newsletter - LDC


In this newsletter:
New publications:
LoReHLT Hausa Representative Language Pack
AIDA Scenario 2 Practice Topic Source Data
________________________________________
New publications:
LoReHLT Hausa Representative Language Pack was developed by LDC and is
comprised of approximately 4.4 million words of Hausa monolingual
text, 86,000 Hausa words translated from English data, and 30 minutes
of Hausa audio recordings. Approximately 96,000 words were annotated
for named entities and over 13,000 words were annotated for full
entity including nominals and pronouns. Noun-phrase chunking was
applied to more than 7,400 words. Over 9,600 words were labeled with
simple semantic annotation. Topic annotation was applied to the audio
recordings. Data was collected from discussion forum, news, reference,
social network, amateur web audio recordings, and weblogs.

LoReHLT was a companion project of the DARPA LORELEI program. The
LORELEI (Low Resource Languages for Emergent Incidents) program was
concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.

2024 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

*

AIDA Scenario 2 Practice Topic Source Data was developed by LDC and is
comprised of 1500 root documents (text, image, and video) from
English, Russian, and Spanish web sources. Each phase of the AIDA
program centered on a specific scenario, or broad topic area, with
related subtopics designated as either practice subtopics or
evaluation subtopics. The Phase 2 scenario focused on the
socioeconomic and political crisis in Venezuela since 2010. This
corpus constitutes the full set of topic-focused documents for Phase 2
practice subtopics.

The DARPA AIDA (Active Interpretation of Disparate Alternatives)
program aimed to develop a multi-hypothesis semantic engine to
generate explicit alternative interpretations of events, situations
and trends from a variety of unstructured sources. LDC supported AIDA
by collecting, creating and annotating multimodal linguistic resources
in multiple languages.

The knowledge base for entity detection and linking annotation for all
AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario
1 and 2 Reference Knowledge Base (LDC2023T10).

2024 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html


LINGUIST List is supported by the following publishers:

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-35-1222
----------------------------------------------------------



More information about the LINGUIST mailing list