17.3141, Diss: Computational Ling/Ling Theories: Buch-Kromann: 'Discontinuou...'

Thu Oct 26 17:05:09 UTC 2006

LINGUIST List: Vol-17-3141. Thu Oct 26 2006. ISSN: 1068 - 4875.

Subject: 17.3141, Diss: Computational Ling/Ling Theories: Buch-Kromann: 'Discontinuou...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Hannah Morales <hannah at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 26-Oct-2006
From: Matthias Buch-Kromann < mtk.id at cbs.dk >
Subject: Discontinuous Grammar: A dependency-based model of human parsing and language learning 

-------------------------Message 1 ---------------------------------- 
Date: Thu, 26 Oct 2006 13:03:43
From: Matthias Buch-Kromann < mtk.id at cbs.dk >
Subject: Discontinuous Grammar: A dependency-based model of human parsing and language learning 

Institution: Copenhagen Business School 
Program: Department of Computational Linguistics 
Dissertation Status: Completed 
Degree Date: 2006 

Author: Matthias Buch-Kromann

Dissertation Title: Discontinuous Grammar: A dependency-based model of human
parsing and language learning 

Dissertation URL:  http://www.id.cbs.dk/~mtk/thesis

Linguistic Field(s): Computational Linguistics
                     Linguistic Theories

Dissertation Director(s):
Sabine Kirchmeier-Andersen
Carl Vikner

Dissertation Abstract:

In the dissertation, Matthias Buch-Kromann presents his dependency-based
grammar formalism, Discontinuous Grammar. The dissertation argues that
grammars should not only distinguish between grammatical and ungrammatical
linguistic analyses, but that they should assign a number (a cost) to the
individual words in both grammatical and ungrammatical analyses, so that
the cost measures the syntactic, semantic, and pragmatic well-formedness of
the individual words; in that way, the grammar can be used to precisely
localize linguistic errors in the analysis. In this setting, parsing,
generation and machine translation can be viewed as optimization problems
where the goal is to find the cheapest analysis that satisfies a given side
condition -- eg, that the analysis corresponds to a given text (parsing),
semantic representation (generation), or source text (machine translation).

The dissertation demonstrates how the proposed formalism deals with a wide
range of linguistic phenomena, including the complement and adjunct
distinction; discontinuous word orders and island constraints; control
constructions, relatives, and parasitic gaps; elliptic coordinations;
anaphora and discourse structure; punctuation; and inflectional and
derivational morphology. The dissertation also describes how these analyses
have formed the theoretical basis for the construction of the Danish
Dependency Treebank, a general purpose corpus for Danish with 100,000 words
equipped with complete dependency analyses.

The dissertation also proposes two methods, HPM and XHPM, for the
statistical estimation of hierarchically classifiable data such as words in
dependency relations, which can be classified according to word class and
ontological class. The dissertation moreover proposes a statistical
language model based on the proposed grammar formalism and estimation
method. Finally, the dissertation proposes a parsing algorithm, local
optimality parsing, which can be used in combination with a manual or
statistically induced grammar to segment and parse an entire discourse. The
dissertation argues that the parsing algorithm has a number of theoretical
advantages compared with other parsing algorithms, such as its speed (it
has an almost-linear time complexity) and its potential as a plausible
model of human parsing. 

-----------------------------------------------------------
LINGUIST List: Vol-17-3141