<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"

            "http://www.w3.org/TR/REC-html40/strict.dtd">

<HTML>

<HEAD>

   <link rel = "stylesheet" href = "../../../utils/default.css">

   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

   <META NAME="Author" CONTENT="Graeme Hirst">

   <META NAME="GENERATOR" CONTENT="Mozilla/4.05 [en] (X11; U; SunOS 5.4 sun4m) [Netscape]">

   <TITLE>Kondrak thesis abstract</TITLE>

</HEAD>

<BODY>

<DIV CLASS="center">

<A CLASS="image" HREF="http://www.cs.utoronto.ca/"><IMG SRC =

"/compling/pics/header.gif" WIDTH = "530" HEIGHT = "25"

ALT="University of Toronto: Department of Computer Science"></A><BR>

<A CLASS="image" HREF="/compling/"><IMG SRC =

"/compling/pics/compling.gif" WIDTH = "530" HEIGHT = "55"

ALT="Computational Linguistics"></A><BR>

</DIV>

<H1>Thesis abstract</H1>

<UL>

<LI><B>Grzegorz Kondrak.</B><BR>

<I>Algorithms for Language Reconstruction.</I><BR>

PhD thesis, Department of Computer Science, University of Toronto, July 2002.

</UL>

<P>Genetically related languages originate from a common

proto-language. In the absence of historical records, proto-languages

have to be reconstructed from surviving cognates, that is words that

existed in the proto-language and are still present in some form in

its descendants. The language reconstruction methods have so far been

largely based on informal and intuitive criteria. In this thesis, I

present techniques and algorithms for performing various stages of the

reconstruction process automatically.

<P>The thesis is divided into three main parts that correspond to the

principal steps of language reconstruction. The first part presents a

new algorithm for the alignment of cognates, which is sufficiently

general to align any two phonetic strings that exhibit some

affinity. The second part introduces a method of identifying cognates

directly from the vocabularies of related languages on the basis of

phonetic and semantic similarity. The third part describes an approach

to the determination of recurrent sound correspondences in bilingual

wordlists by inducing models similar to those developed for

statistical machine translation.

<P>The proposed solutions are firmly grounded in computer science and

incorporate recent advances in computational linguistics, articulatory

phonetics, and bioinformatics.  The applications of the new techniques

are not limited to diachronic phonology, but extend to other areas of

computational linguistics, such as machine translation.

<P><A NAME="Download"></A><B>Download:</B>  <A

HREF="http://www.cs.toronto.edu/~kondrak/thesis.pdf">PDF file</A>

(1.0 Mb); <A

HREF="http://www.cs.toronto.edu/~kondrak/thesis.ps">

PostScript file</A> (1.2 Mb).<BR> <B>Request paper copy:</B> Send

request with postal address to <A

href="mailto:gh@cs.toronto.edu">gh@cs.toronto.edu.</A>

<P>

<hr>

<DIV CLASS="center">

<B><A HREF="/compling/">HOME</A> |

<A HREF="/compling/Topics/">RESEARCH</A> |

<A HREF="/compling/Courses/">COURSES</A> |

<A HREF="/compling/People/">PEOPLE</A> |

<A HREF="/compling/Publications/">PUBLICATIONS</A></B>

</DIV>

<HR>

<ADDRESS>

<SMALL>Last modified 25 August 2002.

Comments, complaints, compliments, and reports of broken links to

<A HREF="mailto:gh@cs.toronto.edu">gh@cs.toronto.edu</A>.

Copyright © 2002 University of Toronto.

</SMALL></ADDRESS>

</BODY>

</HTML>