16.295, Diss: Comp Ling/Syntax: Alves: 'Estimation of ...'

LINGUIST List linguist at linguistlist.org
Mon Jan 31 18:30:51 UTC 2005


LINGUIST List: Vol-16-295. Mon Jan 31 2005. ISSN: 1068 - 4875.

Subject: 16.295, Diss: Comp Ling/Syntax: Alves: 'Estimation of ...'

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Collberg, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================

1)
Date: 26-Jan-2005
From: Eduardo Alves < Alves.Eduardo at gmail.com >
Subject: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution

	
-------------------------Message 1 ----------------------------------
Date: Mon, 31 Jan 2005 13:29:26
From: Eduardo Alves < Alves.Eduardo at gmail.com >
Subject: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution



Institution: University of Electro-Communications
Program: Computer Science and Information Mathematics
Dissertation Status: Completed
Degree Date: 1999

Author: Eduardo Alves

Dissertation Title: Estimation of Strength of Association and Its Application
to Structural Ambiguity Resolution

Linguistic Field(s): Computational Linguistics
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): English (ENG)
                     Japanese (JPN)


Dissertation Director(s):
Teiji Furugori

Dissertation Abstract:

Ambiguity resolution is a central issue in natural language processing.  It
is a necessary step, for instance, for devising robust natural language
understanding or machine translation systems.  We propose a corpus-based
method to measure the strength of association between words in linguistic
constructions  and then apply it to deciding prepositional phrase
attachments in English, determining the correct dependency structure in
Japanese sentences, and resolving structural ambiguities in Japanese noun
phrases containing the particle 'no'.

Essentially there are two methods to resolve structural ambiguities:
rule-based and corpus-based. In the first method, preference rules
applicable to the disambiguation tasks are derived from linguistic
observations.  In the second method, the preferences for disambiguation are
obtained from statistical measures in large-scale corpora.

In this thesis we base our study on the statistical information depicted
from the EDR Corpus and a conceptual dictionary. We use the corpus to get
co-occurrence information between two or more words and  then calculate the
strength of association using mutual information. When the number of
co-occurrences is zero or low, we use the conceptual dictionary and, using
t-scores, substitute the words automatically with the best possible
conceptual classes.  By doing so, we avoid noises introduced by combining
all classes, using unreliable classes or unrelated classes, and a priori
clustering words into
classes.

We verified the effectiveness of our method by applying the strength of
association measure to three types of structural ambiguity resolutions. In
the first experiment we attempted to determine the attachment for
prepositional phrases in English. In the construction   V+N+PP
(verb-noun-prepositional phrase), the PP may attach to N or to V. Here, we
employed the association measure to find the attachment of 500 ambiguous
structures and achieved a success rate of 85.6%. The result is an
improvement over other methods (59.6% to 79.5%), and is comparable to that
of an experiment by human subjects.

In a second experiment we tried to resolve ambiguities in Japanese
sentences.  Due to the relative free word order in Japanese, it is quite
difficult to determine the governor-dependent relations in a sentence.
Here, for 75 constructions taken from sentences each containing an average
of 8.68 probable structures, we obtained a success rate of 87.0%, a
significant improvement over other methods whose success rates ranged from
70.6% to 82.6%.

In the last experiment, we attempted to find the correct structure for
Japanese noun-phrases containing the particle 'no'.  Here, for 429 'no'
constructions, we obtained a success rate of 77.6%. Although this rate is
not especially high, it is an improvement over the preformance for the same
data of the experiments using other methods (72.7% to 73.2%).

The class-based association measure we proposed captures the relevant
information effectively by selecting reliable data. It has generality and
applicability, too, since it uses no rules or idiosyncratic processes.  The
method can be applicable to studying other linguistic phenomena than the
syntactic ambiguities.




-----------------------------------------------------------
LINGUIST List: Vol-16-295	

	



More information about the LINGUIST mailing list