14.1004, Diss: Computational Ling: Zaanen "Bootstrapping..."
LINGUIST List
linguist at linguistlist.org
Fri Apr 4 14:09:13 UTC 2003
LINGUIST List: Vol-14-1004. Fri Apr 4 2003. ISSN: 1068-4875.
Subject: 14.1004, Diss: Computational Ling: Zaanen "Bootstrapping..."
Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
Reviews (reviews at linguistlist.org):
Simin Karimi, U. of Arizona
Terence Langendoen, U. of Arizona
Home Page: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: Anita Yahui Huang <anita at linguistlist.org>
==========================================================================
FUND DRIVE 2003
To give you an incentive to donate, many of our Supporting Publishers
have generously donated some amazing linguistic prizes. As a donor you
are automatically entered into this prize draw. To find out what's on
offer and the rules etc., visit:
http://linguistlist.org/prizedraw.html
As of 1pm, 04/02/03, we only have $18,854.59 to go.
Target: $50,000
Total Raised: $31,145.41
Number of Donors: 650
Percentage of Subscribers Donated: 3.82%
Please consider making a $5 donation at:
http://linguistlist.org/donation.html
The LINGUIST List depends on the generous contributions from
subscribers like you; we would not be able to operate without your
help.
The moderators, staff, and student editors at LINGUIST would like to
take this opportunity to thank you for your continuous support.
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================
1)
Date: Thu, 03 Apr 2003 04:23:45 +0000
From: mvzaanen at uvt.nl
Subject: Computational Ling: Zaanen "Bootstrapping Structure into Language..."
-------------------------------- Message 1 -------------------------------
Date: Thu, 03 Apr 2003 04:23:45 +0000
From: mvzaanen at uvt.nl
Subject: Computational Ling: Zaanen "Bootstrapping Structure into Language..."
Institution: University of Leeds
Program: School of Computing
Dissertation Status: Completed
Degree Date: 2002
Author: Menno M. van Zaanen
Dissertation Title:
Bootstrapping Structure into Language: Alignment-Based Learning
Dissertation URL: http://ilk.uvt.nl/~mvzaanen/publications.html
Linguistic Field: Computational Linguistics
Dissertation Director 1: Rens Bod
Dissertation Director 2: Eric Atwell
Dissertation Abstract:
This thesis introduces a new unsupervised learning framework, called
Alignment-Based Learning, which is based on the alignment of sentences
and Harris's (1951) notion of substitutability. Instances of the
framework can be applied to an untagged, unstructured corpus of
natural language sentences, resulting in a labelled, bracketed version
of that corpus.
Firstly, the framework aligns all sentences in the corpus in pairs,
resulting in a partition of the sentences consisting of parts of the
sentences that are equal in both sentences and parts that are
unequal. Unequal parts of sentences can be seen as being substitutable
for each other, since substituting one unequal part for the other
results in another valid sentence. The unequal parts of the sentences
are thus considered to be possible (possibly overlapping)
constituents, called hypotheses.
Secondly, the selection learning phase considers all hypotheses found
by the alignment learning phase and selects the best of these. The
hypotheses are selected based on the order in which they were found,
or based on a probabilistic function.
The framework can be extended with a grammar extraction phase. This
extended framework is called parseABL. Instead of returning a
structured version of the unstructured input corpus, like the ABL
system, this system also returns a stochastic context-free or tree
substitution grammar.
Different instances of the framework have been tested on the English
ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal
corpus. One of the interesting results, apart from the encouraging
numerical results, is that all instances can (and do) learn recursive
structures.
---------------------------------------------------------------------------
LINGUIST List: Vol-14-1004
More information about the LINGUIST
mailing list