35.3069, FYI: Updated languages: Looking for linguist native speakers of various languages

Tue Nov 5 16:05:02 UTC 2024

LINGUIST List: Vol-35-3069. Tue Nov 05 2024. ISSN: 1069 - 4875.

Subject: 35.3069, FYI: Updated languages: Looking for linguist native speakers of various languages

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Joel Jenkins <joel at linguistlist.org>

================================================================

Date: 28-Oct-2024
From: Loretta Gasparini [lgasparini at student.unimelb.edu.au]
Subject: Updated languages: Looking for linguist native speakers of various languages

Dear colleagues,

Apologies for reposting – I have some new languages we could include
that I previously did not mention, which I think could be of interest.

A team of us at The University of Melbourne and our industry partner
Redenlab (https://redenlab.com/) are working on a pipeline for
automated parts-of-speech tagging across different languages. We are
looking for linguist native speakers of various languages.

UPDATED LIST OF LANGUAGES FOR WHICH WE ARE SEEKING NATIVE SPEAKERS:
Afrikaans, Amharic, Asturian, Belarusian, Bengali, Bulgarian, Danish,
Estonian, Finnish, French (Canada), Hebrew, Hungarian, Icelandic,
Kazakh, Korean, Kyrgyz, Latvian, Macedonian, Malayalam, Maltese,
Marathi, Romanian, Slovak, Slovenian, Spanish (Argentina), Spanish
(Cuba), Spanish (Mexico), Spanish (Spain), Swedish, Tagalog, Tamil,
Telugu, Thai, Urdu, Uyghur, Welsh, Wolof, Yoruba

I have received enough interest for the following languages, thank you
to everyone who already reached out: Arabic, Armenian, Basque,
Catalan, Chinese (Traditional), Chinese (Simplified), Croatian, Czech,
Dutch, Farsi/Persian, French (France), Galician, German, Greek, Hindi,
Indonesian, Italian, Japanese, Lithuanian, Norwegian, Polish,
Portuguese (Brazil), Portuguese (Portugal), Russian, Serbian, Spanish
(Chile), Turkish, Ukrainian, Vietnamese.

We could additionally include other languages I have not mentioned if:

(1) There is a Natural Language Processing (NLP) library with
Parts-of-Speech tagging capability that supports that language. For
example, see all the languages currently supported by Stanza
(https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-5150)

(2) We could source a translation in the language of the North Wind &
Sun fable. Many ‘Illustrations of the IPA’ papers contain a
translation
(https://scholars.sil.org/kenneth_s_olson/ipa_illustrations). Or you
may be able to provide the translation - I can send the English
passage and other language translations if helpful.

Apologies that this excludes many languages. Here is information on
how to add new languages to Stanza
(https://stanfordnlp.github.io/stanza/new_language.html). We would
welcome replications of our current study with new languages as they
become available in NLP libraries in future.

THE WORK: We're looking for 1-2 linguists to label a 120-word passage
for its parts of speech in their native language (estimated max 2
hours). The workload may be longer if we also need to source a
translation. We are aiming for the translation (if needed) and POS
tagging to be completed by the end of November 2024.

THE PROJECT: We would then compare the manually-labelled parts of
speech with available automated methods. This work will be unpaid, but
we will be writing the work into a journal article and will include
everyone who does any part-of-speech tagging as a co-author as part of
a consortium. We are aiming for the parts-of-speech tagging of the
120-word passage to be completed by the end of November 2024, to then
write into a paper ready to submit in early 2025.

If you are a linguist (Bachelor's or higher degree in Linguistics) who
is a native speaker of any language fulfilling the above criteria,
feel free to email me (lgasparini at student.unimelb.edu.au) with 1-2
sentences about your degree and experience in Linguistics and any
questions, and I will get back to you with more info and next steps.

Regards,
Loretta (Lottie) Gasparini
PhD Candidate
The University of Melbourne
Email: lgasparini at student.unimelb.edu.au;
loretta.gasparini at mcri.edu.au

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Brill http://www.brill.com

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Edinburgh University Press https://edinburghuniversitypress.com

Elsevier Ltd http://www.elsevier.com/linguistics

Equinox Publishing Ltd http://www.equinoxpub.com/

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-3069
----------------------------------------------------------