14.1004, Diss: Computational Ling: Zaanen "Bootstrapping..."

LINGUIST List linguist at linguistlist.org
Fri Apr 4 14:09:13 UTC 2003

LINGUIST List:  Vol-14-1004. Fri Apr 4 2003. ISSN: 1068-4875.

Subject: 14.1004, Diss: Computational Ling: Zaanen "Bootstrapping..."

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Anita Yahui Huang <anita at linguistlist.org>

To give you an incentive to donate, many of our Supporting Publishers
have generously donated some amazing linguistic prizes. As a donor you
are automatically entered into this prize draw. To find out what's on
offer and the rules etc., visit:


As of 1pm, 04/02/03, we only have $18,854.59 to go.

Target: $50,000
Total Raised: $31,145.41
Number of Donors: 650
Percentage of Subscribers Donated: 3.82%

Please consider making a $5 donation at:


The LINGUIST List depends on the generous contributions from
subscribers like you; we would not be able to operate without your

The moderators, staff, and student editors at LINGUIST would like to
take this opportunity to thank you for your continuous support.

To post to LINGUIST, use our convenient web form at

Date:  Thu, 03 Apr 2003 04:23:45 +0000
From:  mvzaanen at uvt.nl
Subject:  Computational Ling: Zaanen "Bootstrapping Structure into Language..."

-------------------------------- Message 1 -------------------------------

Date:  Thu, 03 Apr 2003 04:23:45 +0000
From:  mvzaanen at uvt.nl
Subject:  Computational Ling: Zaanen "Bootstrapping Structure into Language..."

Institution: University of Leeds
Program: School of Computing
Dissertation Status: Completed
Degree Date: 2002

Author: Menno M. van Zaanen

Dissertation Title:

Bootstrapping Structure into Language: Alignment-Based Learning

Dissertation URL: http://ilk.uvt.nl/~mvzaanen/publications.html

Linguistic Field: Computational Linguistics

Dissertation Director 1: Rens Bod
Dissertation Director 2: Eric Atwell

Dissertation Abstract:

This thesis introduces a new unsupervised learning framework, called
Alignment-Based Learning, which is based on the alignment of sentences
and Harris's (1951) notion of substitutability. Instances of the
framework can be applied to an untagged, unstructured corpus of
natural language sentences, resulting in a labelled, bracketed version
of that corpus.

Firstly, the framework aligns all sentences in the corpus in pairs,
resulting in a partition of the sentences consisting of parts of the
sentences that are equal in both sentences and parts that are
unequal. Unequal parts of sentences can be seen as being substitutable
for each other, since substituting one unequal part for the other
results in another valid sentence. The unequal parts of the sentences
are thus considered to be possible (possibly overlapping)
constituents, called hypotheses.

Secondly, the selection learning phase considers all hypotheses found
by the alignment learning phase and selects the best of these. The
hypotheses are selected based on the order in which they were found,
or based on a probabilistic function.

The framework can be extended with a grammar extraction phase.  This
extended framework is called parseABL.  Instead of returning a
structured version of the unstructured input corpus, like the ABL
system, this system also returns a stochastic context-free or tree
substitution grammar.

Different instances of the framework have been tested on the English
ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal
corpus. One of the interesting results, apart from the encouraging
numerical results, is that all instances can (and do) learn recursive

LINGUIST List: Vol-14-1004

More information about the Linguist mailing list