14.2786, Diss: Computational Ling: Ramaswamy: 'A...'

Wed Oct 15 16:20:12 UTC 2003

LINGUIST List:  Vol-14-2786. Wed Oct 15 2003. ISSN: 1068-4875.

Subject: 14.2786, Diss: Computational Ling: Ramaswamy: 'A...'

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
 ==========================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  Tue, 14 Oct 2003 05:50:38 +0000
From:  myholyword at hotmail.com
Subject:  A Morphological Analyzer for Tamil

-------------------------------- Message 1 -------------------------------

Date:  Tue, 14 Oct 2003 05:50:38 +0000
From:  myholyword at hotmail.com
Subject:  A Morphological Analyzer for Tamil

Institution: University of Hyderabad
Program: Centre for Applied Linguistics and Translation Studies
Dissertation Status: Completed
Degree Date: 2003

Author: Vaishnavi Ramaswamy

Dissertation Title: A Morphological Analyzer for Tamil

Linguistic Field: Computational Linguistics

Dissertation Director 1: G. Uma Maheshwara Rao

Dissertation Abstract:

This thesis deals with the designing and implementation of a
morphological analyzer for the Tamil language. It also involves a
comparative study of certain other models of morphological processing,
in order to analyze the advantages of each, in terms of suitability
for adaptation for a language like Tamil. This is primarily aimed at
constructing a complete morphological module for Tamil that could be
used in any NLP application like a spell checker, POS tagger, or
parser.

Aspects of designing a computational model for morphological analysis
include:

1) Deciding a model based on psycholinguistic factors.
2) Designing formal methods/techniques that would enable converting
theoretical descriptions into computational models.

The analyzer under consideration relies on a theoretical blend of the
IA and IP approaches to morphological decomposition. Wherever
automatic phonological rules operate largely, IP is incorporated. In
areas where complex but non-automatic morphophonemics (sandhi) is
involved, IA is the choice.

Qualitative and quantitative methods in corpus linguistics were
employed to extract frequency counts and collocations of words. All
possible contexts of occurrence and usage of a word were studied. For
every grammatical category of the language, an extracted list of the
minimum number of word-forms required for a sufficient coverage had
been prepared. Based on such attributes, and in consideration of the
factors of coverage and efficiency for a morphological analyzer, an
essential set of morphological paradigms for each word class in Tamil
had been established. This served as a database comprising of
different tables of inflectional forms of a word, for all the words in
the language.

An analysis of two other well-established models of morphological
analysis: AMPLE and KIMMO had also been taken up for the purpose of
comparison. They formed good platforms for implementing morphological
analyzers in various languages. Implementation of these have been
compared with the Tamil Morph developed here, taking into
consideration factors such as, the cost of implementation in terms of
effort and time, coverage and efficiency.

---------------------------------------------------------------------------
LINGUIST List: Vol-14-2786