34.1912, FYI: June 2023 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Fri Jun 16 01:05:05 UTC 2023


LINGUIST List: Vol-34-1912. Fri Jun 16 2023. ISSN: 1069 - 4875.

Subject: 34.1912, FYI: June 2023 Newsletter - LDC

Moderator: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Joshua Sims, Daniel Swanson, Matthew Fort, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: 15-Jun-2023
From: Membership Coordinator [ldc at ldc.upenn.edu]
Subject: June 2023 Newsletter - LDC


In this newsletter:
LDC at ACL 2023
LDC data and commercial technology development

New publications:
Moroccan Arabic – English Lexical Database
LORELEI Indonesian Representative Language Pack
________________________________________
LDC at ACL 202
LDC will be exhibiting at ACL 2023, held this year July 9-14 in
Toronto, Canada. Stop by our booth to learn more about recent
developments at the Consortium and the latest publications. LDC will
post conference updates via Twitter and Facebook. We look forward to
seeing you there!

LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial product
or for any commercial purpose. LDC data users should consult
corpus-specific license agreements for limitations on the use of
certain corpora. Visit the Licensing page for further information.
________________________________________
New publications:
Moroccan Arabic - English Lexical Database was developed by LDC. It
contains a set of five interrelated tables presenting each Moroccan
Arabic word as an orthographic form in Arabic script and a
pronunciation form in International Phonetic Alphabet (IPA) format.
This release contains over 21,000 Moroccan Arabic words in Arabic
script and IPA notation, and more than 33,000 English tokens.

This lexical database is the result of a collaboration with Georgetown
University Press (GUP) to enhance and update three dialectal Arabic
dictionaries -- Iraqi, Moroccan, and Syrian -- originally published in
paper form in the 1960s by GUP.  LDC also undertook to develop a
lexical database for each dialect. The Georgetown Dictionary of
Moroccan Arabic was published in 2019; this work was based on, and
expanded, A Dictionary of Moroccan Arabic.

The several enhancements developed by LDC included facilitating
comparisons across Arabic dialects and Modern Standard Arabic by
providing Arabic script spellings and IPA pronunciations to Moroccan
words and phrases; promoting ease of use by language learners and
researchers by developing reasonable orthographic conventions for
applying the Arabic alphabet to the dialect; and facilitating a user's
understanding of morphological and lexical relations by adding
information on the linguistic structures of Moroccan Arabic.

2023 members can access this corpus through their LDC accounts
provided they have submitted a signed copy of the special license
agreement. Non-members may license this data for a fee.

*

LORELEI Indonesian Representative Language Pack is comprised of over
17 million words of Indonesian monolingual text, 950,000 million words
of found Indonesian-English parallel text, and 92,000 Indonesian words
translated from English data. Over 113,000 words were annotated for
named entities and more than 24,000 words were annotated for entity
discovery and linking and situation frames (identifying entities,
needs, and issues). Data was collected from discussion forum, news,
reference, social network, and weblogs.

The LORELEI (Low Resource Languages for Emergent Incidents) program
was concerned with building human language technology for low resource
languages in the context of emergent situations. Representative
languages were selected to provide broad typological coverage.

The knowledge base for entity linking annotation is available
separately as LORELEI Entity Detection and Linking Knowledge Base
(LDC2020T10).

2023 members can access this corpus through their LDC accounts.
Non-members may license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC account and
uncheck the box next to “Receive Newsletter” under Account Options or
contact LDC for assistance.

Linguistic Field(s): Computational Linguistics




------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html


LINGUIST List is supported by the following publishers:

American Dialect Society/Duke University Press http://dukeupress.edu

Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge Scholars Publishing http://www.cambridgescholars.com/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Dictionary Society of North America http://dictionarysociety.com/

Edinburgh University Press www.edinburghuniversitypress.com

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

Georgetown University Press http://www.press.georgetown.edu

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Linguistic Association of Finland http://www.ling.helsinki.fi/sky/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

SIL International Publications http://www.sil.org/resources/publications

Springer Nature http://www.springer.com

Wiley http://www.wiley.com


----------------------------------------------------------
LINGUIST List: Vol-34-1912
----------------------------------------------------------



More information about the LINGUIST mailing list