19.836, Diss: Comp Ling/Morphology/Syntax/Text/Corpus Ling: Goyal: 'Example...'

Thu Mar 13 02:41:15 UTC 2008

LINGUIST List: Vol-19-836. Wed Mar 12 2008. ISSN: 1068 - 4875.

Subject: 19.836, Diss: Comp Ling/Morphology/Syntax/Text/Corpus Ling: Goyal: 'Example...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Evelyn Richter <evelyn at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 12-Mar-2008
From: Shailly Goyal < shaillygoyal at gmail.com >
Subject: Example-Based Parsing for Resource-Deficient Languages

-------------------------Message 1 ---------------------------------- 
Date: Wed, 12 Mar 2008 22:39:00
From: Shailly Goyal [shaillygoyal at gmail.com]
Subject: Example-Based Parsing for Resource-Deficient Languages
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-836.html&submissionid=171962&topicid=14&msgnumber=1  

Institution: Indian Institute of Technology, Delhi 
Program: Ph.D. 
Dissertation Status: Completed 
Degree Date: 2007 

Author: Shailly Goyal

Dissertation Title: Example-Based Parsing for Resource-Deficient Languages 

Linguistic Field(s): Computational Linguistics
                     Morphology
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): English (eng)
                     Hindi (hin)

Dissertation Director(s):
Niladri Chatterjee

Dissertation Abstract:

Aim of the present research is to develop parsing schemes for natural language
sentences. Parsed corpus is essential for various natural language processing
(NLP) activities, but its availability cannot be guaranteed for most of the
languages. Furthermore, development of parsed corpus or parsers is not an easy
task using traditional approaches, viz. rule-based and statistical. This is
because success of these approaches almost invariably demands a huge amount of
computational resources that are not typically available for most of the natural
languages. We feel that example-based (EB) approaches can serve as suitable
alternatives at this juncture.

The major advantage of these approaches is that their demand on computational
resources is much less in comparison with the traditional approaches, yet EB
approaches are useful in developing robust techniques as is envisaged in many
areas of artificial intelligence, NLP in particular. In this work, we have
pursued the following two aspects of example-based parsing:

Bilingual Parsing: In this methodology a sentence is parsed using the parse of
its parallel sentence. While projecting the syntactic relations from one
language to another, we have considered similarities as well as dissimilarities
between the two languages. Hence, we are able to develop generalized schemes
that can work on a wide variety of source-target language pairs. We have
developed parsing schemes for simple as well as complex sentences.

Monolingual Parsing: In this scheme a sentence is parsed using the parse
knowledge of examples of the same language. We have developed schemes for
parsing sentences of a language by acquiring appropriate knowledge from a parsed
example base of the same language. In this work, we have devised ways to take
care of various problems, such as unknown words, free word order property,
morphological variations, effectively. We have developed both these schemes in a
generalized way with minimal dependence on linguistic knowledge so that the
schemes developed can be used across a wide spectrum of languages. In this work
we have done a thorough case-study on Hindi. For nitty-gritty details of the
parsing schemes, where linguistic details are inevitable, we have considered
English and Hindi as the source and target language, respectively. Furthermore,
we have chosen link grammar as the underlying grammar for representing the parse
of sentences. One fundamental requirement therefore is a link grammar for
languages under consideration. Since no such grammar exists for Hindi, we have
developed a link grammar for Hindi. For this task also we follow example-based
approach.

Development of Hindi Link Grammar: Instead of developing the link grammar for
Hindi from scratch, in this work we have made appropriate modifications in the
English link grammar to suit the requirements of the Hindi grammar. We have
shown how English links can be adapted for Hindi by taking care of its various
grammatical nuances (e.g. free word order, noun and verb morphology, influence
of subject/object on verb morphology) that make Hindi grammar distinctly
different from English. The parsing schemes developed in this work have been
implemented, and tested on a reasonably-sized example base. Still we have been
able to demonstrate clearly the efficacy of these schemes. We feel that our
research will pave the way for quick development of parsers for other languages.

-----------------------------------------------------------

This Year the LINGUIST List hopes to raise $60,000. This money will go to help keep 
the List running by supporting all of our Student Editors for the coming year.

See below for donation instructions, and don't forget to check out our Fund Drive 
2008 LINGUIST List Circus and join us on our many shows!

http://linguistlist.org/fund-drive/2008/

There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at  
https://linguistlist.org/donation/donate/donate1.cfm

Alternatively you can also pledge right now and pay later. To do so, go to:
https://linguistlist.org/donation/pledge/pledge1.cfm

For all information on donating and pledging, including information on how to 
donate by check, money order, or wire transfer, please visit:
http://linguistlist.org/donate.html

The LINGUIST List is under the umbrella of Eastern Michigan University and as such 
can receive donations through the EMU Foundation, which is a registered 501(c) 
Non Profit organization. Our Federal Tax number is 38-6005986. These donations 
can be offset against your federal and sometimes your state tax return (U.S. tax 
payers only). For more information visit the IRS Web-Site, or contact your 
financial advisor.

Many companies also offer a gift matching program, such that they will match any 
gift you make to a non-profit organization. Normally this entails your contacting 
your human resources department and sending us a form that the EMU Foundation fills 
in and returns to your employer. This is generally a simple administrative procedure 
that doubles the value of your gift to LINGUIST, without costing you an extra penny. 
Please take a moment to check if your company operates such a program.

Thank you very much for your support of LINGUIST!

-----------------------------------------------------------
LINGUIST List: Vol-19-836