34.3470, FYI: November 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Fri Nov 17 17:05:08 UTC 2023


LINGUIST List: Vol-34-3470. Fri Nov 17 2023. ISSN: 1069 - 4875.

Subject: 34.3470, FYI: November 2023 Newsletter - LDC

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>
================================================================


Date: 15-Nov-2023
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: November 2023 Newsletter - LDC


In this newsletter:
Join LDC for Membership Year 2024

New publications:
REMIX Telephone Collection
News Sub-domain Named Entity Recognition
________________________________________
Join LDC for Membership Year 2024
It’s time to renew your LDC membership for 2024. Current (2023)
members who renew their membership before March 1, 2024 will receive a
10% discount. New or returning organizations will receive a 5%
discount if they join the Consortium by March 1.

Plans for 2024 publications are in progress. Among the expected
releases are:
•KASET: 147 hours of Sorani Kurdish and Kurmanji Kurdish
conversational telephone speech and web broadcasts, 65 hours
transcribed
•AIDA Topic Source Data and Annotations: multimodal source data and
annotations in multiple languages (Russian, Ukrainian, English,
Spanish) for information and entity extraction
•RATS Low Speech Density Data: 87 hours of Levantine Arabic, English,
Persian, Pushto, and Urdu audio files selected from RATS speech
activity detection and keyword spotting data sets, also including
communications systems sounds and silence
•Call My Net 1: 364 hours of conversational telephone speech
recordings in Tagalog, Cebuano, Cantonese and Mandarin from speakers
in the Philippines and China using various handsets under diverse
noise conditions
•Ravnursson Faroese Speech and Transcripts: 109 hours of read speech
from 433 native speakers with transcripts
•Diaspora Tibetan Speech: elicited, read, and spontaneous speech from
73 native Tibetan speakers in Katmandu’s diaspora Tibetan community,
some recordings transcribed
•IARPA MATERIAL language packs: conversational telephone speech,
transcripts, English translations, annotations, and queries in
multiple languages (e.g., Bulgarian, Somali, Georgian)
•LORELEI: representative and incident language packs containing
monolingual text, bi-text, translations, annotations, supplemental
resources, and related tools in various languages (e.g., Farsi,
Hungarian, Hindi, Amharic)
For full descriptions of all LDC data sets, browse our Catalog. Visit
Join LDC for details on membership, user accounts and payment.
________________________________________
New publications:

REMIX Telephone Collection was developed by LDC and contains 320 hours
of English conversational telephone speech from 358 speakers who had
completed all tasks in one of the previous LDC Mixer collections,
specifically, Mixers 4-7. The data was collected in 2012; recordings
in this corpus were used to support the NIST 2012 Speaker Recognition
Evaluation. Speakers completed up to 12 calls lasting up to 10 minutes
conversing on suggested topics. They were asked that half of the calls
be made in a "noisy" environment, e.g., from a speakerphone, a busy
street, noisy store or office, or a room with loud background noise.
Speaker metadata is included.

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

*

News Sub-domain Named Entity Recognition was developed at the
University of Pennsylvania and contains over 20,000 English news
sentences annotated with named entities and categorized into
sub-domains. The sentences were extracted from The New York Times
Annotated Corpus (LDC2008T19). Named entity annotation was based on
the CoNLL-2003 guidelines and annotation scheme. Sentences were
labeled with person (PER), location (LOC) and organization (ORG) tags
using phrase matching with a manual second pass. Sub-domains are: Arts
(+Weekend/Cultural), Business (+Financial), Classifieds (+Obituary),
Editorial, Foreign, Metropolitan, Sports, and Others. "Others"
includes topics such as Real Estate, New Jersey Weekly, Book Review,
Job Market, Science, and Health & Fitness.

2023 members can access this corpus through their LDC accounts
provided they have submitted a signed copy of the special license
agreement. Non-members may license this data for a fee.

Membership Coordinator
LDC
T: +1-215-573-1275
E: ldc at ldc.upenn.edu

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html


LINGUIST List is supported by the following publishers:

American Dialect Society/Duke University Press http://dukeupress.edu

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge Scholars Publishing http://www.cambridgescholars.com/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Dictionary Society of North America http://dictionarysociety.com/

Edinburgh University Press www.edinburghuniversitypress.com

Elsevier Ltd http://www.elsevier.com/linguistics

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Linguistic Association of Finland http://www.ling.helsinki.fi/sky/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

SIL International Publications http://www.sil.org/resources/publications

Springer Nature http://www.springer.com

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-34-3470
----------------------------------------------------------



More information about the LINGUIST mailing list