33.2140, Confs: Comp Ling, Morphology, Text/Corpus Ling, Translation/USA

The LINGUIST List linguist at listserv.linguistlist.org
Mon Jun 27 03:11:10 UTC 2022


LINGUIST List: Vol-33-2140. Mon Jun 27 2022. ISSN: 1069 - 4875.

Subject: 33.2140, Confs: Comp Ling, Morphology, Text/Corpus Ling, Translation/USA

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Billy Dickson
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Everett Green, Sarah Goldfinch, Nils Hjortnaes,
        Joshua Sims, Billy Dickson, Amalia Robinson, Matthew Fort
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Hosted by Indiana University

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Mon, 27 Jun 2022 03:10:14
From: John Ortega [jortega at cs.nyu.edu]
Subject: First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

 
First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation 
Short Title: CoCo4MT 

Date: 16-Sep-2022 - 16-Sep-2022 
Location: Orlando (and hybrid), USA 
Contact: John Ortega 
Contact Email: coco4mt2022 at googlegroups.com 
Meeting URL: https://sites.google.com/view/coco4mt 

Linguistic Field(s): Computational Linguistics; Morphology; Text/Corpus Linguistics; Translation 

Meeting Description: 

The First Workshop on Corpus Generation and Corpus Augmentation for Machine
Translation (CoCo4MT) will be co-located with AMTA 2022 in Orlando, Florida,
USA on September 16th, 2022. 

CoCo4MT sets out to be the first workshop centered around research that
focuses on corpora creation, cleansing, and augmentation techniques
specifically for machine translation. 

We hope that submissions will provide high-quality corpora that is available
publicly for download and can be used to increase machine translation
performance thus encouraging new dataset creation for multiple languages that
will, in turn, provide a general workshop to consult for corpora needs in the
future.

Topics of the workshop include but are not limited to:
- Difficulties with using existing corpora (e.g., political considerations or
domain limitations) and their effects on final MT systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques, 
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for training MT
systems.
 






------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-33-2140	
----------------------------------------------------------





More information about the LINGUIST mailing list