23.3230, Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'

linguist at linguistlist.org linguist at linguistlist.org
Mon Jul 30 14:53:16 UTC 2012


LINGUIST List: Vol-23-3230. Mon Jul 30 2012. ISSN: 1069 - 4875.

Subject: 23.3230, Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Lili Xia <lxia at linguistlist.org>
================================================================  


Date: Mon, 30 Jul 2012 10:52:31
From: Monica Gavrila [gavrila at informatik.uni-hamburg.de]
Subject: Improving Recombination in a Linear EBMT System by Use of Constraints

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-3230.html&submissionid=4550801&topicid=14&msgnumber=1
 
Institution: Universität Hamburg 
Program: Department of Informatics 
Dissertation Status: Completed 
Degree Date: 2012 

Author: Monica Gavrila

Dissertation Title: Improving Recombination in a Linear EBMT System by Use of
Constraints 

Dissertation URL:  http://ediss.sub.uni-hamburg.de/volltexte/2012/5758/

Linguistic Field(s): Computational Linguistics

Subject Language(s): English (eng)
                     German (deu)
                     Romanian (ron)


Dissertation Director(s):
Walther von Hahn
David Farwell
Wolfgang Menzel

Dissertation Abstract:

(Automatic) machine translation (MT) is one of the most challenging 
domains in Natural Language Processing (NLP) and plays an important 
role in ensuring global communication, especially in a multilingual world 
with access to large amounts of Internet resources. As rule-based MT 
approaches need manually developed resources, new MT directions 
have been developed over the last twenty years, such as corpus-
based machine translation (CBMT): statistical MT (SMT) and example-
based machine translation (EBMT). These new directions are based 
mainly on the existence of a parallel aligned corpus and, therefore, can 
be easily employed for lower-resourced languages.

In this dissertation we showed how EBMT systems behave when a 
lower-resourced inflecting language (i.e. Romanian) is involved in the 
translation process. For this purpose we built an EBMT baseline 
system based only on surface forms (the Lin-EBMT system). One of 
our main goals was to investigate the impact of word-order constraints 
on the translation results: we integrated constraints extracted from 
generalized examples (i.e. templates) in Lin-EBMT and built an 
extended system: Lin- EBMTREC+. Although constraints represent a 
well-known method which is employed quite often in NLP, the use of 
word-order constraints in an EBMT system is an innovative approach 
which can open new paths in the domain of example-based MT. We 
run our experiments for two language-pairs in both directions of 
translation: Romanian-German and Romanian-English. This aspect 
raises interesting questions, as Romanian and German present 
language specific characteristics, which make the translation process 
even more challenging. Both EBMT systems developed are easily 
adaptable for other language-pairs. They are platform and language-
pair independent, provided that a parallel aligned corpus for the 
language-pair exists and that the tools used for obtaining the needed 
intermediate information (e.g. word alignment) are available. As a side 
question, we studied how EBMT reacts in comparison to SMT. We 
compared the EBMT results obtained to results provided by a Moses-
based SMT system and the Google Translate on-line system. To 
provide a complete view on CBMT, the performance of each MT 
system was assessed in several experimental settings, using different 
corpora (type and size), various system settings and additional part-of-
speech (POS) information. We evaluated the translation results by 
means of three automatic evaluation metrics: BLEU, NIST and TER. A 
subset of the results was manually analyzed for a better overview on 
the translation quality.

Our experiments showed that constraints improve translation results, 
although a clear decision which constraint-combination works best 
could not be taken. Although the SMT system outperformed the EBMT 
system in all experiments, the manual analysis provided cases in which 
EBMT offered more accurate results. The behavior of the systems 
while changing the experimental settings confirmed that (training and 
test) data have a substantial impact on both MT approaches. The 
difference between the results of the two MT approaches decreased 
when a more restricted corpus was used. As expected, both CBMT 
approaches worked better for shorter sentences. 






----------------------------------------------------------
LINGUIST List: Vol-23-3230	
----------------------------------------------------------



More information about the LINGUIST mailing list