32.3486, FYI: BigScience Data Sourcing Hackathon

The LINGUIST List linguist at listserv.linguistlist.org
Thu Nov 4 12:01:27 UTC 2021


LINGUIST List: Vol-32-3486. Thu Nov 04 2021. ISSN: 1069 - 4875.

Subject: 32.3486, FYI: BigScience Data Sourcing Hackathon

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Thu, 04 Nov 2021 08:00:47
From: Francesco De Toni [francesco.detoni at uwa.edu.au]
Subject: BigScience Data Sourcing Hackathon

 
BigScience is a year-long open scientific collaborative workshop of 600
researchers from 50 countries and more than 250 institutions who collaborate
on creating a very large multilingual neural network language model trained on
a very large multilingual text dataset. During the workshop, the participants
plan to investigate the dataset and the model from all angles: bias, social
impact, capabilities, limitations, ethics, potential improvements, specific
domain performances, carbon impact, general AI/cognitive research landscape.

The BigScience Data Sourcing working group has launched a hackathon to
document and collect a multilingual dataset of language sources in accordance
with the BigScience open-research governance principles. We are looking to
gather a wide variety of resources that represent different kinds of language
use: different regions, different contexts, and different audiences. In order
to collect as many examples of these variations as possible, we need to look
for a variety of data types and formats such as books and formal publications,
audio formats including radio and podcasts, and others, in addition to
traditional web sources.

Read about the hackathon and join here:
https://github.com/bigscience-workshop/data_sourcing/wiki
 



Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-32-3486	
----------------------------------------------------------






More information about the LINGUIST mailing list