15.1279, Diss: Comuputational Ling: Diab: 'Word...'

Thu Apr 22 18:55:20 UTC 2004

LINGUIST List:  Vol-15-1279. Thu Apr 22 2004. ISSN: 1068-4875.

Subject: 15.1279, Diss: Comuputational Ling: Diab: 'Word...'

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
 ==========================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  Tue, 20 Apr 2004 21:13:31 -0400 (EDT)
From:  mdiab at stanford.edu
Subject:  Word Sense Disambiguation within a Multilingual Framework

-------------------------------- Message 1 -------------------------------

Date:  Tue, 20 Apr 2004 21:13:31 -0400 (EDT)
From:  mdiab at stanford.edu
Subject:  Word Sense Disambiguation within a Multilingual Framework

Institution: University of Maryland
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2003

Author: Mona Talat Diab

Dissertation Title: Word Sense Disambiguation within a Multilingual
Framework

Dissertation URL: http://www.umiacs.umd.edu/~mdiab/finalthesis.pdf

Linguistic Field: Computational Linguistics

Dissertation Director 1: Philip Resnik

Dissertation Abstract:

Word Sense Disambiguation (WSD) is the process of resolving the
meaning of a word unambiguously in a given natural language
context. Within the scope of this thesis, it is the process of marking
text with explicit sense labels.

What constitutes a sense is a subject of great debate. An appealing
perspective, aims to define senses in terms of their multilingual
correspondences, an idea explored by several researchers, Dyvik
(1998), Ide (1999), Resnik \& Yarowsky (1999), and Chugur, Gonzalo \&
Verdejo (2002) but to date it has not been given any practical
demonstration. This thesis is an empirical validation of these ideas
of characterizing word meaning using cross-linguistic
correspondences. The idea is that word meaning or word sense is
quantifiable as much as it is uniquely translated in some language or
set of languages.

Consequently, we address the problem of WSD from a multilingual
perspective; we expand the notion of context to encompass multilingual
evidence. We devise a new approach to resolve word sense ambiguity in
natural language, using a source of information that was never
exploited on a large scale for WSD before.

The core of the work presented builds on exploiting word
correspondences across languages for sense distinction. In essence, it
is a practical and functional implementation of a basic idea common to
research interest in defining word meanings in cross-linguistic terms.

We devise an algorithm, SALAAM for Sense Assignment Leveraging
Alignment And Multilinguality, that empirically investigates the
feasibility and the validity of utilizing translations for WSD. SALAAM
is an unsupervised approach for word sense tagging of large amounts of
text given a parallel corpus --- texts in translation --- and a sense
inventory for one of the languages in the corpus. Using SALAAM, we
obtain large amounts of sense annotated data in both languages of the
parallel corpus, simultaneously. The quality of the tagging is
rigorously evaluated for both languages of the corpora.

The automatic unsupervised tagged data produced by SALAAM is further
utilized to bootstrap a supervised learning WSD system, in essence,
combining supervised and unsupervised approaches in an intelligent way
to alleviate the resources acquisition bottleneck for supervised
methods. Essentially, SALAAM is extended as an unsupervised approach
for WSD within a learning framework; in many of the cases of the words
disambiguated, SALAAM coupled with the machine learning system rivals
the performance of a canonical supervised WSD system that relies on
human tagged data for training.

Realizing the fundamental role of similarity for SALAAM, we
investigate different dimensions of semantic similarity as it applies
to verbs since they are relatively more complex than nouns, which are
the focus of the previous evaluations. We design a human judgment
experiment to obtain human ratings on verbs' semantic similarity. The
obtained human ratings are cast as a reference point for comparing
different automated similarity measures that crucially rely on various
sources of information. Finally, a cognitively salient model
integrating human judgments in SALAAM is proposed as a means of
improving its performance on sense disambiguation for verbs in
particular and other word types in general.

---------------------------------------------------------------------------
LINGUIST List: Vol-15-1279