32.240, FYI: January 2021 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Sun Jan 17 15:38:52 UTC 2021


LINGUIST List: Vol-32-240. Sun Jan 17 2021. ISSN: 1069 - 4875.

Subject: 32.240, FYI: January 2021 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Sun, 17 Jan 2021 10:37:13
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: January 2021 Newsletter - LDC

 
In this newsletter: 
Renew Your LDC Membership Today

New Publications:
LORELEI Akan Representative Language Pack
ATIS – Seven Languages
BOLT English Treebank – SMS/Chat 
________________________________________
Renew Your LDC Membership Today

Now through March 1, 2021, 2020 members receive a 10% discount on 2021
membership, and new or returning organizations receive a 5% discount.
Membership remains the most economical way to access current and past LDC
releases. Consult Join LDC for more details on membership options and
benefits. 
________________________________________
New publications:
(1) LORELEI Akan Representative Language Pack consists of Akan monolingual
text, Akan-English parallel text, annotations, supplemental resources, and
related software tools developed by LDC for the DARPA LORELEI program.

Data was collected from discussion forum, news, reference, social network, and
weblog. Data volumes are as follows:
- Over 3.3 million words of Akan monolingual text, all of which were
translated into English
- 115,000 Akan words translated from English data

Approximately 2,300 words were annotated for named entities, full entity
including nominals and pronouns, entity linking, simple semantic annotation,
and situation frame annotation (identifying entities, needs, and issues).
Around 2,000 words have morphological segmentation annotation.

LORELEI Akan Representative Language Pack is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

(2) ATIS – Seven Languages was developed by Amazon Web Services, Inc. and
consists of 5,871 English utterances from ATIS (Air Travel Information
Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data
(LDC94S19), and ATIS3 Test Data (LDC95S26), translated into six languages:
Spanish, German, French, Portuguese, Chinese, and Japanese.

The data is separated into 4,978 utterances for training and 893 utterances
for testing following the original ATIS division. The source English
utterances were manually translated into the six languages and are included in
this release. annotated with named entities via table lookup; markers include
city, airline, airport names, and dates.

ATIS Seven Languages is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data at no cost.

*

(3) BOLT English Treebank – SMS/Chat was developed by LDC and consists of
English SMS and text chat data with part-of-speech and syntactic structure
annotation.

The source data consists of 115,667 tokens/words in 484 files of English SMS
and text chat collected by LDC using two methods: new collection via LDC's
collection platform and donation of SMS or chat archives from BOLT collection
participants. 
BOLT English Treebank – SMS/Chat is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus.
2021 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.

*

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-32-240	
----------------------------------------------------------






More information about the LINGUIST mailing list