13.1535, Diss: Computational Ling: Eisner "Smoothing..."

Tue May 28 21:17:26 UTC 2002

LINGUIST List:  Vol-13-1535. Tue May 28 2002. ISSN: 1068-4875.

Subject: 13.1535, Diss: Computational Ling: Eisner "Smoothing..."

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Consulting Editor:
        Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	James Yuells, EMU		Marie Klopfenstein, WSU
	Michael Appleby, EMU		Heather Taylor, EMU
	Ljuba Veselinova, Stockholm U.	Richard John Harvey, EMU
	Dina Kapetangianni, EMU		Renee Galvis, WSU
	Karolina Owczarzak, EMU

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Karolina Owczarzak <karolina at linguistlist.org>

=================================Directory=================================

1)
Date:  Sat, 25 May 2002 02:07:18 +0000
From:  jason at cs.jhu.edu
Subject:  Computational Ling: Eisner "Smoothing a Probabilistic Lexicon..."

-------------------------------- Message 1 -------------------------------

Date:  Sat, 25 May 2002 02:07:18 +0000
From:  jason at cs.jhu.edu
Subject:  Computational Ling: Eisner "Smoothing a Probabilistic Lexicon..."

New Dissertation Abstract

Institution: University of Pennsylvania
Program: Computer and Information Science
Dissertation Status: Completed
Degree Date: 2001

Author: Jason Michael Eisner

Dissertation Title:
Smoothing a Probabilistic Lexicon Via Syntactic Transformations

Dissertation URL: http://cs.jhu.edu/~jason/papers/#thesis01

Linguistic Field: Computational Linguistics

Dissertation Director 1: Mitchell P. Marcus

Dissertation Abstract:

Probabilistic parsing requires a lexicon that specifies each word's
syntactic preferences in terms of probabilities. To estimate these
probabilities for words that were poorly observed during training,
this thesis assumes the existence of arbitrarily powerful
transformations (also known to linguists as lexical redundancy rules
or metarules) that can add, delete, retype or reorder the argument and
adjunct positions specified by a lexical entry.  In a given language,
some transformations apply frequently and others rarely. We describe
how to estimate the rates of the transformations from a sample of
lexical entries. More deeply, we learn which properties of a
transformation increase or decrease its rate in the language. As a
result, we can smooth the probabilities of lexical entries. Given
enough direct evidence about a lexical entry's probability, our
Bayesian approach trusts the evidence; but when less evidence or no
evidence is available, it relies more on the transformations' rates
to guess how often the entry will be derived from related
entries.

Abstractly, the proposed ``transformation models'' are probability
distributions that arise from graph random walks with a log-linear
parameterization. A domain expert constructs the parameterized graph,
and a vertex is likely according to whether random walks tend to halt
at it. Transformation models are suited to any domai where ``related''
events (as defined by the graph) may have positively covarying
probabilities. Such models admit a natural prior that favors simple
regular relationships over stipulative exceptions. The model
parameters can be locally optimized by gradient-based methods or by
Expectation-Maximization. Exact algorithms (matrix inversion) and
approximate ones (relaxation) are provided, with
optimizations. Variations on the idea are also discussed. We
compare the new technique empirically to previous techniques from the
probabilistic parsing literature, using comparable features, and
obtain a 20% perplexity reduction (similar to doubling the amount of
training data). Some of this reduction is shown to stem from the
transformation model's ability to match observed probabilities, and
some from its ability to generalize. Model averaging yields a final
24% perplexity reduction.

---------------------------------------------------------------------------
LINGUIST List: Vol-13-1535