[RNLD List] First call for participation: MT4All Unsupervised MT Shared Task at SIGUL 2022

Mon Mar 14 11:55:15 UTC 2022

MT4All Unsupervised MT Shared Task

at SIGUL 2022

(24-25 June, Marseille)

FIRST CALL FOR PARTICIPATION

We invite you to participate in the first edition of the MT4All
Unsupervised Machine Translation Shared Task, hosted by the ELRA/ISCA
Special Interest Group on Under-Resourced Languages Workshop (SIGUL 2022).
Papers on the task will be published as part of the Proceedings.

Invitation to Participate – Expression of Interest
<https://docs.google.com/forms/d/1tllq0jWhcKwMHgPtRCA4aLkgLDuN8JlZG7Vp4TqcNQ0>
.

TASK DESCRIPTION

For this Shared task we will leverage the resources generated by the
recently finished CEF project MT4All , with the aim of exploring
unsupervised MT techniques based only on monolingual corpora. In the course
of the project, the following novel datasets were created: 18 monolingual
corpora for specific languages and domains, 12 bilingual dictionaries and
translation models, and 10 annotated datasets for evaluation. Most of them
will be used in the present Shared task.

The task is divided into three separate subtasks, each one covering a
specific domain and set of languages.

   -

   Subtask 1: Unsupervised translation from English to Ukrainian, Georgian
   and Kazakh in the Legal domain.
   -

   Subtask 2: Unsupervised translation from English to Finnish, Latvian,
   and Norwegian Bokmål in the Financial domain.
   -

   Subtask 3: Unsupervised translation from English to German, Norwegian
   Bokmål, and Spanish in the Customer support domain.

In this Shared task, we are interested in how the in-domain monolingual
data that we will provide can be leveraged by creating a purely
unsupervised machine translation model, either by

   -

   training an unsupervised model from scratch, or
   -

   adding value to an existing pre-trained model, on the condition that
   -

      it has been trained on monolingual datasets
      -

      it has not been fine-tuned with any parallel data
      -

      it is publicly accessible from the HuggingFace repository

Although we exclude the possibility of fine-tuning the models with any
existing parallel data, we allow making use of the bilingual resources
created in the framework of MT4All using purely unsupervised technologies.

As additional monolingual data, we allow the use of any monolingual Oscar
dataset, only.

IMPORTANT DATES

   -

   Training data release 10.03.2022

   -

   Test sets release 25.04.2022

   -

   Results deadline 02.05.2022

   -

   Paper submission deadline 16.05.2022

   -

   Acceptance notice 30.05.2022

   -

   Camera ready 13.06.2022

   -

   Workshop starts 24.06.2022

Please visit the website for more details:
https://sigul-2022.ilc.cnr.it/mt4all-shared-task/

If you have any comments and/or questions, do not hesitate to contact
ksenia.kharitonova at bsc.es.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20220314/1b1f8529/attachment.htm>