33.3585, FYI: November 2022 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Wed Nov 16 08:28:43 UTC 2022


LINGUIST List: Vol-33-3585. Wed Nov 16 2022. ISSN: 1069 - 4875.

Subject: 33.3585, FYI: November 2022 Newsletter - LDC

Moderators:

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Wed, 16 Nov 2022 08:28:30
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: November 2022 Newsletter - LDC

 
In this newsletter: 

Join LDC for membership year 2023 
It’s time to renew your LDC membership for 2023. Current (2022) members who
renew their membership before March 1, 2023 will receive a 10% discount. New
or returning organizations will receive a 5% discount if they join the
Consortium by March 1.

In addition to receiving new publications, current LDC members enjoy the
benefit of licensing older data from our Catalog of 900+ holdings at reduced
fees. Current-year for-profit members may use most data for commercial
applications.

For full descriptions of all LDC data sets, browse our Catalog.  Visit Join
LDC for details on membership, user accounts and payment.

Spring 2023 data scholarship application deadline
Applications are now being accepted through January 15, 2023 for the Spring
2023 LDC data scholarship program which provides university students with
no-cost access to LDC data. Consult the LDC Data Scholarships page for more
information about program rules and submission requirements.
______________________________

New publications:
BOLT English Translation Treebank – Egyptian Arabic SMS/Chat was developed by
LDC and consists of SMS and chat text data (472 files representing 98,206
tokens) translated from Egyptian Arabic to English and annotated for
part-of-speech and syntactic structure. Only the translated English text is
included in the source data for this release. Part-of-speech and treebank
annotation conformed to Penn Treebank II style, incorporating changes to those
guidelines that were developed under the GALE (Global Autonomous Language
Exploitation) program. Supplementary guidelines for English treebanks and web
text are included in the corpus documentation. 

2022 members can access this corpus through their LDC accounts. Non-members
may license this data for a fee.
*
Samrómur Children Icelandic Speech 1.0 was developed by the Language and Voice
Lab, Reykjavik University in cooperation with Almannarómur, Center for
Language Technology. The corpus contains 131 hours of Icelandic prompted
speech from 3,175 speakers (children, aged 4-17 years) representing 137,597
utterances.

Speech data was collected between October 2019 and September 2021 using the
Samrómur website which displayed prompts to participants. The prompts were
mainly from The Icelandic Gigaword Corpus, which includes text from novels,
news, plays, and from a list of location names in Iceland. Additional prompts
were taken from the Icelandic Web of Science and others were created by
combining a name followed by a question or a demand. Prompts and speaker
metadata are included in the corpus

2022 members can access this corpus through their LDC accounts provided they
have submitted a completed copy of the special license agreement. Non-members
may license this data for a fee.
*
Third DIHARD Challenge Development was developed by LDC and contains
approximately 34 hours of English and Chinese speech data along with
corresponding annotations used in support of the Third DIHARD Challenge.

The DIHARD third development and evaluation sets were drawn from diverse
sources including monologues, map task dialogues, broadcast interviews,
sociolinguistic interviews, meeting speech, speech in restaurants, clinical
recordings, and amateur web videos. Annotations include diarization and
segmentation.

2022 members can access this corpus through their LDC accounts. Non-members
may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and uncheck
the box next to “Receive Newsletter” under Account Options; or contact LDC for
assistance. 

 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2022 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-33-3585	
----------------------------------------------------------





More information about the LINGUIST mailing list