23.3230, Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'
linguist at linguistlist.org
linguist at linguistlist.org
Mon Jul 30 14:53:16 UTC 2012
LINGUIST List: Vol-23-3230. Mon Jul 30 2012. ISSN: 1069 - 4875.
Subject: 23.3230, Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org
Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!
USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21
For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.
Editor for this issue: Lili Xia <lxia at linguistlist.org>
================================================================
Date: Mon, 30 Jul 2012 10:52:31
From: Monica Gavrila [gavrila at informatik.uni-hamburg.de]
Subject: Improving Recombination in a Linear EBMT System by Use of Constraints
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-3230.html&submissionid=4550801&topicid=14&msgnumber=1
Institution: Universität Hamburg
Program: Department of Informatics
Dissertation Status: Completed
Degree Date: 2012
Author: Monica Gavrila
Dissertation Title: Improving Recombination in a Linear EBMT System by Use of
Constraints
Dissertation URL: http://ediss.sub.uni-hamburg.de/volltexte/2012/5758/
Linguistic Field(s): Computational Linguistics
Subject Language(s): English (eng)
German (deu)
Romanian (ron)
Dissertation Director(s):
Walther von Hahn
David Farwell
Wolfgang Menzel
Dissertation Abstract:
(Automatic) machine translation (MT) is one of the most challenging
domains in Natural Language Processing (NLP) and plays an important
role in ensuring global communication, especially in a multilingual world
with access to large amounts of Internet resources. As rule-based MT
approaches need manually developed resources, new MT directions
have been developed over the last twenty years, such as corpus-
based machine translation (CBMT): statistical MT (SMT) and example-
based machine translation (EBMT). These new directions are based
mainly on the existence of a parallel aligned corpus and, therefore, can
be easily employed for lower-resourced languages.
In this dissertation we showed how EBMT systems behave when a
lower-resourced inflecting language (i.e. Romanian) is involved in the
translation process. For this purpose we built an EBMT baseline
system based only on surface forms (the Lin-EBMT system). One of
our main goals was to investigate the impact of word-order constraints
on the translation results: we integrated constraints extracted from
generalized examples (i.e. templates) in Lin-EBMT and built an
extended system: Lin- EBMTREC+. Although constraints represent a
well-known method which is employed quite often in NLP, the use of
word-order constraints in an EBMT system is an innovative approach
which can open new paths in the domain of example-based MT. We
run our experiments for two language-pairs in both directions of
translation: Romanian-German and Romanian-English. This aspect
raises interesting questions, as Romanian and German present
language specific characteristics, which make the translation process
even more challenging. Both EBMT systems developed are easily
adaptable for other language-pairs. They are platform and language-
pair independent, provided that a parallel aligned corpus for the
language-pair exists and that the tools used for obtaining the needed
intermediate information (e.g. word alignment) are available. As a side
question, we studied how EBMT reacts in comparison to SMT. We
compared the EBMT results obtained to results provided by a Moses-
based SMT system and the Google Translate on-line system. To
provide a complete view on CBMT, the performance of each MT
system was assessed in several experimental settings, using different
corpora (type and size), various system settings and additional part-of-
speech (POS) information. We evaluated the translation results by
means of three automatic evaluation metrics: BLEU, NIST and TER. A
subset of the results was manually analyzed for a better overview on
the translation quality.
Our experiments showed that constraints improve translation results,
although a clear decision which constraint-combination works best
could not be taken. Although the SMT system outperformed the EBMT
system in all experiments, the manual analysis provided cases in which
EBMT offered more accurate results. The behavior of the systems
while changing the experimental settings confirmed that (training and
test) data have a substantial impact on both MT approaches. The
difference between the results of the two MT approaches decreased
when a more restricted corpus was used. As expected, both CBMT
approaches worked better for shorter sentences.
----------------------------------------------------------
LINGUIST List: Vol-23-3230
----------------------------------------------------------
More information about the LINGUIST
mailing list