18.2657, Diss: Comp Ling/Morphology: Xanthos: 'Apprentissage automatique de ...'

LINGUIST Network linguist at LINGUISTLIST.ORG
Wed Sep 12 16:35:42 UTC 2007


LINGUIST List: Vol-18-2657. Wed Sep 12 2007. ISSN: 1068 - 4875.

Subject: 18.2657, Diss: Comp Ling/Morphology: Xanthos: 'Apprentissage automatique de ...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hunter Lockwood <hunter at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 12-Sep-2007
From: Aris Xanthos < Aris.Xanthos at unil.ch >
Subject: Apprentissage automatique de la morphologie: le cas des structures racine-schème

 

	
-------------------------Message 1 ---------------------------------- 
Date: Wed, 12 Sep 2007 12:33:52
From: Aris Xanthos [Aris.Xanthos at unil.ch]
Subject: Apprentissage automatique de la morphologie: le cas des structures racine-schème
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-2657.html&submissionid=156042&topicid=14&msgnumber=1  


Institution: University of Lausanne 
Program: Department of Linguistics 
Dissertation Status: Completed 
Degree Date: 2007 

Author: Aris Xanthos

Dissertation Title: Apprentissage automatique de la morphologie: le cas des
structures racine-schème 

Linguistic Field(s): Computational Linguistics
                     Morphology

Subject Language(s): Arabic, Standard (arb)


Dissertation Director(s):
François Bavaud
John A. Goldsmith
Remi J. Jolivet

Dissertation Abstract:

This dissertation is concerned with the development of algorithmic methods
for the unsupervised learning of natural language morphology, using a
symbolically transcribed wordlist. It focuses on the case of languages
approaching the introflectional type, such as Arabic or Hebrew. The
morphology of such languages is traditionally described in terms of
discontinuous units: consonantal roots and vocalic patterns. Inferring this
kind of structure is a challenging task for current unsupervised learning
systems, which generally operate with continuous units. 

In this study, the problem of learning root-and-pattern morphology is
divided into a phonological and a morphological subproblem. The
phonological component of the analysis seeks to partition the symbols of a
corpus (phonemes, letters) into two subsets that correspond well with the
phonetic definition of consonants and vowels; building around this result,
the morphological component attempts to establish the list of roots and
patterns in the corpus, and to infer the rules that govern their
combinations. We assess the extent to which this can be done on the basis
of two hypotheses: (i) the distinction between consonants and vowels can be
learned by observing their tendency to alternate in speech; (ii) roots and
patterns can be identified as sequences of the previously discovered
consonants and vowels respectively. 

The proposed algorithm uses a purely distributional method for partitioning
symbols. Then it applies analogical principles to identify a preliminary
set of reliable roots and patterns, and gradually enlarge it. This
extension process is guided by an evaluation procedure based on the minimum
description length principle, in line with the approach to morphological
learning embodied in Linguistica (Goldsmith, 2001). The algorithm is
implemented as a computer program named Arabica; it is evaluated with
regard to its ability to account for the system of plural formation in a
corpus of Arabic nouns.

This thesis shows that complex linguistic structures can be discovered
without recourse to a rich set of a priori hypotheses about the phenomena
under consideration. It illustrates the possible synergy between learning
mechanisms operating at distinct levels of linguistic description, and
attempts to determine where and why such a cooperation fails. It concludes
that the tension between the universality of the consonant-vowel
distinction and the specificity of root-and-pattern structure is crucial
for understanding the advantages and  weaknesses of this approach. 





-----------------------------------------------------------
LINGUIST List: Vol-18-2657	

	



More information about the LINGUIST mailing list