26.2203, FYI: Release of Autodesk Post-Editing Data Corpus
The LINGUIST List via LINGUIST
linguist at listserv.linguistlist.org
Mon Apr 27 19:05:59 UTC 2015
LINGUIST List: Vol-26-2203. Mon Apr 27 2015. ISSN: 1069 - 4875.
Subject: 26.2203, FYI: Release of Autodesk Post-Editing Data Corpus
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org
************* LINGUIST List 2015 Fund Drive *************
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/
Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================
Date: Mon, 27 Apr 2015 15:04:25
From: Ventsislav Zhechev [contact at ventsislavzhechev.eu]
Subject: Release of Autodesk Post-Editing Data Corpus
Dear all,
It is my pleasure to announce the release of the Autodesk Post-Editing Data corpus with the ISLRN 290-859-676-529-5 (http://www.islrn.org/resources/identify_islrn/).
This resource contains parallel English source–MT/TM target segments post-edited into several languages (simplified and Traditional Chinese, Czech, French, German, Hungarian, Italian, Japanese, Korean, Polish, Brazilian
Portuguese, Russian, Spanish) with between 30,000 and 410,000 segments per language. Its main intended use is for research in automatic quality estimation of Machine Translation output. The provided data are predominantly
software user manual content with some segments coming from marketing and education materials. They cover the portfolio of Autodesk products from various domains, notably architecture, engineering, civil engineering, simulation, computer graphics, media and entertainment. The content was translated in the period 2012.11.12 to 2014.09.23.
The corpus is available from https://autodesk.box.com/Autodesk-PostEditing and more information is available in the included Readme file. The data are released under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Regards,
Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster
Platform Architecture and Technologies
Localisation Services
MAIN +41 32 723 91 22
FAX +41 32 723 93 99
http://VentsislavZhechev.eu
Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuchâtel, Switzerland
www.autodesk.com
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Subject Language(s): Chinese, Mandarin (cmn)
French (fra)
German (deu)
Hungarian (hun)
Italian (ita)
Japanese (jpn)
Korean (kor)
Polish (pol)
Portuguese (por)
Russian (rus)
Spanish (spa)
----------------------------------------------------------
LINGUIST List: Vol-26-2203
----------------------------------------------------------
More information about the LINGUIST
mailing list