18.1455, Diss: Computational Ling/Text&Corpus Ling/Translation: Chandra: 'Ma...'

Mon May 14 15:35:40 UTC 2007

LINGUIST List: Vol-18-1455. Mon May 14 2007. ISSN: 1068 - 4875.

Subject: 18.1455, Diss: Computational Ling/Text&Corpus Ling/Translation: Chandra: 'Ma...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Laura Welcher, Rosetta Project  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hunter Lockwood <hunter at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 11-May-2007
From: Subhash Chandra < subhash.jnu at gmail.com >
Subject: Machine Recognition and Morphological Analysis of Subanta-Padas

-------------------------Message 1 ---------------------------------- 
Date: Mon, 14 May 2007 11:31:38
From: Subhash Chandra < subhash.jnu at gmail.com >
Subject: Machine Recognition and Morphological Analysis of Subanta-Padas 

Institution: Jawaharlal Nehru University, New Delhi 
Program: Special Centre for Sanskrit Studies (SCSS) 
Dissertation Status: Completed 
Degree Date: 2006 

Author: Subhash Chandra

Dissertation Title: Machine Recognition and Morphological Analysis of
Subanta-Padas 

Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics
                     Translation

Subject Language(s): Sanskrit (san)

Dissertation Director(s):
Girish Nath Jha

Dissertation Abstract:

The Indian Heritage Group of the Centre for Development of Advanced
Computing (CDAC) has developed a system called DESIKA, which claims to
process all the words of Sanskrit and includes generation and analysis
(parsing). The Rashtriya Sanskrit Vidyapeeth, Tirupathi under the
leadership of Prof. K. V. Ramakrishnamacharyulu (currently Vice Chancellor
of Rajasthan Sanskrit University) has done commendable work on the
Sansk-net project. Prof. Vineet Chaitanya and Amba Kulkarni are visiting
the institution and are currently guiding several Sanskrit R&D initiatives
with far reaching consequences.

The Academy of Sanskrit Research, Melkote, Mysore has been actively
involved in bringing scholars doing technology R&D for Sanskrit and
shAstras on a single platform.

The Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New
Delhi is currently engaged in the following R&D - kAraka Analyzer, sandhi
splitter and analyzer, verb analyzer, NP gender agreement, POS tagging of
Sanskrit, online Multilingual amarakoaha, Panni's AshTadhyAyI search
engine, online MahAbhArata indexing and Jha (2006) presented a model of
Sanskrit Analysis System (SAS). The RCILTS project under Prof. G.V. Singh
at the School of Computer and Systems Sciences has prepared useful
linguistic resources for Sanskrit.

Morphological analyzers for Sanskrit, Telugu, Hindi, Marathi, Kannada and
Punjabi have been developed by Akshara Bharathi Group at Indian Institute
of Technology, Kanpur, and University of Hyderabad funded by Ministry of
Information Technology the project claims to have 95% coverage for Telugu
(arbitrary text in modern standard Telugu), and 88% coverage for Hindi.
This system is available on the site for downloading as well as online at:
http://www.iiit.net/ltrc/morph/index.htm

Anusaaraka (developed by Akshar Bharati group, IIIT, Hyderabad) is a
computer software which renders text from one Indian language into another,
a sort of machine translation. It produces output which is comprehensible
to the reader, although at times it might not be grammatical. The system is
available at the IIIT Hyderabad site )

How is this work different?
The work is different from existing research in the following ways:
1. No online RDBMS based recognizer-analyzer is available till date, which
accepts and displays results in Unicode Devanagari script but this system
takes Unicode Devanagri text and displays results in Devanagari,
2. This system takes Devanagari utf-8 text as input and delivers Devanagari
utf-8 text output using a Java servlet Apache-Tomcat - JDBC - RDBMS technology,
3. gives a comprehensive computational analysis of subanta-padas in a
Sanskrit text, and does basic tagging of verbs and avyayas too,
4. uses a hybrid approach to process input text. It works on the
morphological nature of bases and applies the vibhakti information for
processing,
5. the system can be used for larger processing of Sanskrit for text
simplification and machine translation

Summary of chapters
Chapter I discusses morphological analyzers, current status of R&D in this
field, structure and organization of of AshTAdhyAyI (AD), and subanta of
Panini. 
Chapter II discusses subanta formalism of Panini and mechanisms to
recognize verb, avyaya and subanta in Sanskrit text.
Chapter III discusses the analysis of subanta-padas. 
Chapter IV discusses the implementation aspects: the front end, Java
objects, databases, linguistic resources (corpus and rule bases and example
bases), how they work and what is basic requirement of the system and how
to apply sandhi and subanta rule where ever necessary. 
Conclusion discusses future R&D, limitations of the system and result analysis.

-----------------------------------------------------------
LINGUIST List: Vol-18-1455