35.3069, FYI: Updated languages: Looking for linguist native speakers of various languages
The LINGUIST List
linguist at listserv.linguistlist.org
Tue Nov 5 16:05:02 UTC 2024
LINGUIST List: Vol-35-3069. Tue Nov 05 2024. ISSN: 1069 - 4875.
Subject: 35.3069, FYI: Updated languages: Looking for linguist native speakers of various languages
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Joel Jenkins <joel at linguistlist.org>
================================================================
Date: 28-Oct-2024
From: Loretta Gasparini [lgasparini at student.unimelb.edu.au]
Subject: Updated languages: Looking for linguist native speakers of various languages
Dear colleagues,
Apologies for reposting – I have some new languages we could include
that I previously did not mention, which I think could be of interest.
A team of us at The University of Melbourne and our industry partner
Redenlab (https://redenlab.com/) are working on a pipeline for
automated parts-of-speech tagging across different languages. We are
looking for linguist native speakers of various languages.
UPDATED LIST OF LANGUAGES FOR WHICH WE ARE SEEKING NATIVE SPEAKERS:
Afrikaans, Amharic, Asturian, Belarusian, Bengali, Bulgarian, Danish,
Estonian, Finnish, French (Canada), Hebrew, Hungarian, Icelandic,
Kazakh, Korean, Kyrgyz, Latvian, Macedonian, Malayalam, Maltese,
Marathi, Romanian, Slovak, Slovenian, Spanish (Argentina), Spanish
(Cuba), Spanish (Mexico), Spanish (Spain), Swedish, Tagalog, Tamil,
Telugu, Thai, Urdu, Uyghur, Welsh, Wolof, Yoruba
I have received enough interest for the following languages, thank you
to everyone who already reached out: Arabic, Armenian, Basque,
Catalan, Chinese (Traditional), Chinese (Simplified), Croatian, Czech,
Dutch, Farsi/Persian, French (France), Galician, German, Greek, Hindi,
Indonesian, Italian, Japanese, Lithuanian, Norwegian, Polish,
Portuguese (Brazil), Portuguese (Portugal), Russian, Serbian, Spanish
(Chile), Turkish, Ukrainian, Vietnamese.
We could additionally include other languages I have not mentioned if:
(1) There is a Natural Language Processing (NLP) library with
Parts-of-Speech tagging capability that supports that language. For
example, see all the languages currently supported by Stanza
(https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-5150)
(2) We could source a translation in the language of the North Wind &
Sun fable. Many ‘Illustrations of the IPA’ papers contain a
translation
(https://scholars.sil.org/kenneth_s_olson/ipa_illustrations). Or you
may be able to provide the translation - I can send the English
passage and other language translations if helpful.
Apologies that this excludes many languages. Here is information on
how to add new languages to Stanza
(https://stanfordnlp.github.io/stanza/new_language.html). We would
welcome replications of our current study with new languages as they
become available in NLP libraries in future.
THE WORK: We're looking for 1-2 linguists to label a 120-word passage
for its parts of speech in their native language (estimated max 2
hours). The workload may be longer if we also need to source a
translation. We are aiming for the translation (if needed) and POS
tagging to be completed by the end of November 2024.
THE PROJECT: We would then compare the manually-labelled parts of
speech with available automated methods. This work will be unpaid, but
we will be writing the work into a journal article and will include
everyone who does any part-of-speech tagging as a co-author as part of
a consortium. We are aiming for the parts-of-speech tagging of the
120-word passage to be completed by the end of November 2024, to then
write into a paper ready to submit in early 2025.
If you are a linguist (Bachelor's or higher degree in Linguistics) who
is a native speaker of any language fulfilling the above criteria,
feel free to email me (lgasparini at student.unimelb.edu.au) with 1-2
sentences about your degree and experience in Linguistics and any
questions, and I will get back to you with more info and next steps.
Regards,
Loretta (Lottie) Gasparini
PhD Candidate
The University of Melbourne
Email: lgasparini at student.unimelb.edu.au;
loretta.gasparini at mcri.edu.au
Linguistic Field(s): Computational Linguistics
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Brill http://www.brill.com
Cambridge University Press http://www.cambridge.org/linguistics
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Edinburgh University Press https://edinburghuniversitypress.com
Elsevier Ltd http://www.elsevier.com/linguistics
Equinox Publishing Ltd http://www.equinoxpub.com/
European Language Resources Association (ELRA) http://www.elra.info
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Oxford University Press http://www.oup.com/us
Wiley http://www.wiley.com
----------------------------------------------------------
LINGUIST List: Vol-35-3069
----------------------------------------------------------
More information about the LINGUIST
mailing list