16.1188, Diss: Lexicography: Mills: 'Computer-assisted ...'

LINGUIST List linguist at linguistlist.org
Thu Apr 14 17:59:41 UTC 2005


LINGUIST List: Vol-16-1188. Thu Apr 14 2005. ISSN: 1068 - 4875.

Subject: 16.1188, Diss: Lexicography: Mills: 'Computer-assisted ...'

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Collberg, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================

1)
Date: 13-Apr-2005
From: Jon Mills < j.mills at email.com >
Subject: Computer-assisted Lemmatisation of a Cornish Text Corpus for Lexicographical Purposes

	
-------------------------Message 1 ----------------------------------
Date: Thu, 14 Apr 2005 13:57:49
From: Jon Mills < j.mills at email.com >
Subject: Computer-assisted Lemmatisation of a Cornish Text Corpus for Lexicographical Purposes



Institution: University of Exeter
Program: Department of Language and Linguistics
Dissertation Status: Completed
Degree Date: 2002

Author: Jon Mills

Dissertation Title: Computer-assisted Lemmatisation of a Cornish Text Corpus
for Lexicographical Purposes

Dissertation URL:  http://www.geocities.com/f_j_mills/thesis_abstract.html

Linguistic Field(s): Lexicography

Subject Language(s): Cornish (CRN)
                     Old Cornish (OCO)
                     Middle Cornish (CNX)


Dissertation Director(s):
Reinhard Hartmann

Dissertation Abstract:

This project sets out to discover and develop techniques for the
lemmatisation of a historical corpus of the Cornish language in order that
a lemmatised dictionary macrostructure can be generated from the corpus.
The system should be capable of uniquely identifying every lexical item
that is attested in the corpus. A survey of publish ed and unpublished
Cornish dictionaries, glossaries and lexicographical notes was carried out.
A corpus was compiled incorporating specially prepared new critical
editions. An investigation int the history of Cornish lemmatisation was
undertaken. A system ic description of Cornish inflection was written.
Three methods of corpus lemmatisation were trialed. Findings were as
follows. Lexicographical history shapes current Cornish lexicographical
practice. Lexicon based tokenisation has advantages over character based
tokenisati . System networks provide the means to generate base forms from
attested word types. Grammatical difference is the most reliable way of
disambiguating homographs. A lemma that contains three fields, the
canonical form, the part -of-speec and a semantic field label, provides of
a unique code for every lexeme attested in the corpus. Programs which
involve human interaction during the lemmatisation process allow
bootstrapping of the lemmatisation database. Computerised morphological
processing may be used at least to partially create the lemmatisation
database. Disambiguation of at least some of the most common homographs may
be automated by the use of computer programs.




-----------------------------------------------------------
LINGUIST List: Vol-16-1188	

	



More information about the LINGUIST mailing list