36.1825, Confs: Terminology Translation Shared Task @WMT2025 (China)
The LINGUIST List
linguist at listserv.linguistlist.org
Wed Jun 11 17:05:02 UTC 2025
LINGUIST List: Vol-36-1825. Wed Jun 11 2025. ISSN: 1069 - 4875.
Subject: 36.1825, Confs: Terminology Translation Shared Task @WMT2025 (China)
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Valeriia Vyshnevetska <valeriia at linguistlist.org>
================================================================
Date: 10-Jun-2025
From: Kirill Semenov [kirill.semenov at uzh.ch]
Subject: Terminology Translation Shared Task @WMT2025
Terminology Translation Shared Task @WMT2025
Date: 05-Nov-2025 - 09-Nov-2025
Location: Suzhou, China
Meeting URL: https://www2.statmt.org/wmt25/terminology.html
Linguistic Field(s): Computational Linguistics; Text/Corpus
Linguistics; Translation
Subject Language(s): Chinese (zho)
English (eng)
German (deu)
Russian (rus)
Spanish (spa)
Submission Deadline: 15-Jul-2025
Terminology Translation Task at WMT2025 - Call for Participation
We are excited to announce the third Shared Task on Terminology
Translation, which would be run within the 10th Conference on Machine
Translation (WMT2025) in Suzhou, China.
TL;DR:
- We test the sentence-level and document-level translation of the
texts in finance and IT domains, given the explicit terminology. The
language pairs are: English -> {Spanish, German, Russian, Chinese},
Chinese -> English.
- We evaluate the overall quality of translation, terminology success
rate and consistency. Additionally, we compare the performance of
systems given no terms provided, proper terminology and random terms.
- The task starts on 15th June 2025 AOE, the submission deadline is
15th July 2025 AOE.
- Please pre-register via Google Forms here:
https://forms.gle/ZSn2pNJkQJAzHFnA6 .
Overview:
The advances in neural MT and LLM-assisted translation of the last
decade show nearly human quality in general domain translation at
least for the high-resource languages. However, when it comes to
specialized domains like science, finance, or legal texts, where the
correct and consistent use of special terms is crucial, the task is
far from being solved. The Terminology Shared Task aims to assess the
extent to which machine translation models can utilize additional
information regarding the translation of terminologies. Compared to
two previous editions, 2021 and 2023, the new test data have more
various test cases, are more consistent in domains for each
translation direction, and are broader in language coverage.
Task Description:
Track №1: Sentence/Paragraph-Level Translation
You will be provided with sequence of input sentences long, and small
terminology dictionaries that will correspond only to the terms
present in the given sentence.
Language Pairs:
en-de (English - German)
en-ru (English - Russian)
en-es (English - Spanish)
Domains:
Information technology
Track №2: Document-Level Translation
The setup is similar to Track №1, with two exceptions: the length of
the input texts now equals the document, and the dictionaries
correspond to the whole set of input texts (i.e. they are
corpus-level). This makes the task close to the real-life setup (where
the dictionaries exist independently from the texts), while it may
complicate the implementation (since for the solutions that require
storing the whole dictionary it will take more memory). Additionally,
for the whole document setup, the problem of the consistent usage of
terms is becoming more important.
Language Pairs:
en-zh-Hant (English - Traditional Chinese)
zh-Hant-en (Traditional Chinese - English)
Domains:
Finance
Evaluation - Terminology Modes:
You are expected to compare your system’s performance under three
modes:
1. No terminology: the system is only provided with input
sentences/documents.
2. Proper terminology: the system is provided with input texts (same
as 1.) and dictionaries of the format {source_term: target_term}.
3. Random terminology: the system is provided with input texts and
translation dictionaries of the same format as in 2. The difference is
that the dictionary items are not special terms but words randomly
drawn from input texts. This mode is of special interest since we want
to measure to what extent the proper term translations help to improve
the system performance (2.), as opposed to an arbitrary broader input
that does not contain the domain-specific terminology.
Evaluation - Metrics:
1. Overall Translation Quality: we will evaluate the general aspects
of machine translation outputs such as fluency, adequacy and
grammaticality. We will do that with the general MT automatic metrics
such as BLEU or COMET. In addition to that, we will pay special
attention to the grammaticality of the translated terms.
2. Terminology Success Rate: This metric assesses the ability of the
system to accurately translate technical terms given the specialized
vocabulary. This will be carried out by comparing the occurrences of
the correct term translations (i.e. the ones present in the
dictionary) to the output terms. The goal is to have a higher success
rate that will show adherence to dictionary translations.
3. Terminology Consistency: for domains such as science or legal
texts, the consistent use of an introduced term throughout the text is
crucial. In other words, we want a system to not only pick up a
correct term in a target language but to use it consistently once it
is chosen. This will be evaluated by comparing all translations of a
given source term in a text and measuring the percentage of deviations
from the most consistent translation. This metric is more important
for the Document-Level track, but it will be used for both tracks.
Important Dates:
All dates are end of Anywhere on Earth (AoE).
Data snippets released: 7th May 2025
Dev data released: 22nd May 2025
Test data release, task starts: 15th June 2025
Submission deadline: 15th July 2025
Paper submission to WMT25: in-line with WMT25
Camera-ready submission to WMT25: in-line with WMT25
Conference in Suzhou, China: 05-09 November 2025
Submission Guidelines:
0. Please notify us about your participation prior to submission. This
is optional, but will be very helpful for us for better understanding
of our workload after submission. Please do it through this Google
Form: https://forms.gle/ZSn2pNJkQJAzHFnA6
1. Check your submission files with the validation script. It will be
published at test date publication.
2. Write a description of your system (optional).
3. Submit your system via Google Forms. The Google form with all
necessary sumbission details will be published at the test set date.
All details on submission as well as FAQ can be found at the webpage
of the shared task.
Organizers:
- Kirill Semenov (University of Zurich), main contact: FirstNаmе
[dоt] LаstNаmе {аt} uzh /dоt/ сh
- Nathaniel Berger (Heidelberg University)
- Pinzhen Chen (University of Edinburgh & Aveni.ai)
- Xu Huang (Nanjing University)
- Arturo Oncevay (JP Morgan)
- Dawei Zhu (Amazon)
- Vilém Zouhar (ETH Zurich)
Website:
https://www2.statmt.org/wmt25/terminology.html
In case of query, please send an email to Kirill Semenov (see email
above).
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Edinburgh University Press http://www.edinburghuniversitypress.com
Elsevier Ltd http://www.elsevier.com/linguistics
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
MIT Press http://mitpress.mit.edu/
Multilingual Matters http://www.multilingual-matters.com/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Oxford University Press http://www.oup.com/us
Wiley http://www.wiley.com
----------------------------------------------------------
LINGUIST List: Vol-36-1825
----------------------------------------------------------
More information about the LINGUIST
mailing list