13.2127, Diss: Computational Ling: Yamada "Syntax-based..."

Mon Aug 19 18:15:37 UTC 2002

LINGUIST List:  Vol-13-2127. Mon Aug 19 2002. ISSN: 1068-4875.

Subject: 13.2127, Diss: Computational Ling: Yamada "Syntax-based..."

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Consulting Editor:
        Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	James Yuells, EMU		Marie Klopfenstein, WSU
	Michael Appleby, EMU		Heather Taylor, EMU
	Ljuba Veselinova, Stockholm U.	Richard John Harvey, EMU
	Dina Kapetangianni, EMU		Renee Galvis, WSU
	Karolina Owczarzak, EMU		Anita Wang, EMU

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
          Zhenwei Chen, E. Michigan U. <zhenwei at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Karolina Owczarzak <karolina at linguistlist.org>

=================================Directory=================================

1)
Date:  Sat, 17 Aug 2002 16:32:00 +0000
From:  kyamada at isi.edu
Subject:  Computational Ling: Yamada "Syntax-based Statistical..."

-------------------------------- Message 1 -------------------------------

Date:  Sat, 17 Aug 2002 16:32:00 +0000
From:  kyamada at isi.edu
Subject:  Computational Ling: Yamada "Syntax-based Statistical..."

New Dissertation Abstract

Institution: University of Southern California
Program: Information Sciences Institute
Dissertation Status: Completed
Degree Date: 2002

Author: Kenji Yamada

Dissertation Title:
A Syntax-based Statistical Translation Model

Linguistic Field: Computational Linguistics
Subject Language: Japanese, English, Chinese, Mandarin

Dissertation Director 1: Kevin Knight
Dissertation Director 2: Eduard Hovy
Dissertation Director 3: Paul Rosenbloom
Dissertation Director 4: Daniel Marcu

Dissertation Abstract:
A statistical translation model is a mathematical model for the
process of human-language translation. Model parameters are
automatically estimated using a corpus of translation pairs. This is
in contrast to conventional rule-based machine translation systems, in
which lexical, syntactic, and semantic translation rules are manually
crafted by language experts over several years.

The idea of statistical machine translation was first seen in the late
1940's, but the computational power at that time was not
sufficient. In the last decade, word-to-word statistical translation
models regained researchers' interest, due to increasing computational
power and growing volume of online training materials.

This thesis introduces a more advanced statistical translation model
that better exploits such growing resources. Most statistical
translation models are based on word-to-word translations, i.e., the
operation in a model works on each word independently. We present a
new model that translates a syntactic parse tree into a foreign
language sentence, in which the model operations work on each node of
the syntactic parse tree. To obtain a syntactic parse tree, we use an
existing parser developed elsewhere. This is to take advantage of
using available linguistic resources in a statistical framework. By
using a syntactic parser, we are able to use rich syntactic
information embedded in a sentence, and we are able to model more
linguistically-motivated word movements in language translations.  We
use a parser only for the channel input, so that our model works for
translations from any linguistically resource-poor language to a
resource-rich language such as English.

We have developed an efficient training algorithm and an experimental
decoding program for the syntax-based translation model. We
demonstrate that the alignment accuracy for Japanese-English is more
than 30% better in our model compared to previous word-to-word models,
and demonstrate that the decoding performance is 10-40% better for
Chinese-English and Arabic-English translations.

---------------------------------------------------------------------------
LINGUIST List: Vol-13-2127