35.48, Confs: Latin American and Iberian Languages Open Corpora Forum

Sun Jan 7 19:05:02 UTC 2024

LINGUIST List: Vol-35-48. Sun Jan 07 2024. ISSN: 1069 - 4875.

Subject: 35.48, Confs: Latin American and Iberian Languages Open Corpora Forum

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Zackary Leech <zleech at linguistlist.org>
================================================================

Date: 05-Jan-2024
From: Livy Real [livyreal at gmail.com]
Subject: Latin American and Iberian Languages Open Corpora Forum

Latin American and Iberian Languages Open Corpora Forum
Short Title: OPENCOR

Date: 12-Mar-2024 - 12-Mar-2024
Location: Santiago de Compostela, Spain
Contact: Livy Real
Contact Email: livyreal at gmail.com
Meeting URL: https://opencor.gitlab.io/cfp-opencor-2024/

Linguistic Field(s): Text/Corpus Linguistics
Subject Language(s): English (eng)
                     Portuguese (por)

Meeting Description:

This will be the fifth edition of OpenCor, an annual venue that aims
to gather the community to work on freely available language resources
for the variety of languages spoken in Iberian countries and Latin
America.

Recent years have seen a move in Computational Linguistics towards
bigger and better, more reliably annotated corpora. However, the
existence of such reliably annotated corpora is one of the serious
bottlenecks for processing natural language. Producing and maintaining
corpora is a difficult task that usually requires sizeable funding and
the cooperation of several experts. Although having such corpora
available is essential, the many difficulties and the amount of work
needed to produce reliable corpora make creating this data and making
it available a non-trivial proposition. Producing reliable corpora
continues to be an invisible task in Natural Language Processing.
Especially when working on languages different from English, on
smaller datasets not immediately suitable for machine learning
approaches, or on a new release of a previous dataset, it needs to be
made clear to the corpora creators how to publish and properly discuss
their work. Most of the biggest Natural Language Processing venues are
closed to accepting corpora descriptions. The situation is even worse
when considering minority and endangered languages since most of them
do not have a related venue where these works can be discussed.

The Latin American and Iberian communities that produce open corpora
have yet to establish an event allowing experts to share ideas,
discuss difficulties, and get feedback on their work. Different
meetings have been held in the last years, but either they need to be
more generic to embrace all corpora work done in these communities, or
there needs to be continuation and support for future editions. Due to
these conditions, it is common for groups that share related interests
or face the same difficulties to be unaware of other groups and their
recent work within these communities.

This forum aims both to fill the gap of having a permanent venue for
the construction, annotation, and maintenance of open corpora for
Latin American and Iberian languages and to create an extensive list
of these resources. OpenCor welcomes discussions on Portuguese,
Spanish, indigenous languages, creoles, Galician, Catalan, Aragonese,
Astur-Leonese, Aranese, and other languages spoken in Latin America or
Iberian countries. Work on endangered, minority, and/or less-resourced
languages is particularly welcome.

This is the fifth edition of OpenCor Forum, a forum to gather the
community that produces, maintains, and distributes freely available
language resources for the large variety of languages spoken in
Iberian countries and Latin America. All accepted works will also be
part of the OpenCor list, an initiative to have catalogued open
resources produced for the targeted languages. This forum welcomes,
but is not restricted to, the following topics:

releases of new open data sets
descriptions of established open corpora
guidelines creation, annotation strategies, and best practices
discussion
corpora maintenance and management
corpora curation and assessment
corpora design and evaluation
corpora creation strategies and difficulties faced by the community
ethical aspects of corpora creation

------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html

LINGUIST List is supported by the following publishers:

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-48
----------------------------------------------------------