[Lexicog] Issues regarding a free dictionary

Antti Arppe aarppe at LING.HELSINKI.FI
Wed Dec 28 17:27:46 UTC 2005


Dear list members,

Some comments on creating spell-checkers out of computational 
morphological models (better late than never).

On Sat, 10 Dec 2005, Andrew Dunbar wrote:
> On 12/9/05, Mike Maxwell <maxwell at ldc.upenn.edu> wrote:
>> Andrew Dunbar wrote:
>> Agreed, and as you say, with varying degrees of success.  The point of my
>> message was that there is often (maybe usually) much more to creating a
>> spell checker than just having a list of "dictionary words" in the language.
[...]
>>> For inflection, compounding, and other morphology and such issues which
>>> give rise to many correct forms, either a "smart" spellchecker which knows
>>> about paradigms and irregular forms, as well as containing a dictionary is
>>> one approach. Another approach is to put the "smarts" into a program which
>>> builds a full dictionary including all inflections etc from a basic dictionary.
[...]
>> (2) Alternatively, you can parse the inflected forms on the fly, using e.g.
>> a finite state transducer.  Spell correction might be more difficult with
>> this kind of approach (although I'm sure it can be done), but it is perhaps
>> the only feasible route for highly inflected languages like Finnish.

Exactly so. Especially if a language uses compounding in its word 
formation, one needs a computational model which allows for extensive 
compounding, if one aspires for any acceptable coverage; one simply 
cannot list out (preemptively) all possible forms. This, on its part, 
causes a challenge of its own, as a computational model able to 
analyze such compound forms in a satisfactory manner and with 
sufficient coverage can assume that all input forms are grammatical. 
In the case of spell-checking one cannot make this assumption, and all 
too often many common typos can be analyzed as structurally possible 
but semantically funny compounds of short and frequenct words. Thus, 
if one does not want to allow for too many false acceptances, one has 
to limit the scope of compounding with various strategies.

In practice, the development of such spell-checkers has been 
undertaken at Lingsoft <www.lingsoft.fi> in the 1990s for Finnish, 
Swedish, Norwegian, Danish and German. These spellers were based on 
morpholological models and lexicons according to Koskenniemi's 
two-level morphology (TWOL), which belongs to the realm of 
finite-state models. The core of the above problems are illustrated 
among other things, in the following article:

The Very Long Way from Basic Linguistic Research to Commercially 
Successful Language Technology: the Case of Two-Level Morphology. In: 
Inquiries into Words, Constraints, and Contexts. Festschrift in the 
Honour of Professor Kimmo Koskenniemi on his 60th Birthday. (2005 
forthcoming). Arppe, Antti; Carlson, Lauri; Lindén, Krister; 
Piitulainen, Jussi; Suominen, Mickael; Vainio, Martti; Westerlund, 
Hanna, Yli-Jyrä, Anssi; (Editors). CSLI Studies in Computational 
Linguistics ONLINE. Copestake, Ann (Series editor), pp. 2-17. URL: 
http://www.ling.helsinki.fi/~aarppe/Publications/KK60-1-Arppe.pdf

The above is based on more extensive course material, which is only 
available in Finnish/Swedish. In short, listing words and their 
morphological rules can make a good linguistic analyzer (aka word 
breaker), but there is still a lot of work to transform it into a good 
speller.

Regards,

    -Antti Arppe

--
======================================================================
Antti Arppe - Master of Science (Engineering)
Researcher & doctoral student (Linguistics)
E-mail: antti.arppe at helsinki.fi
WWW: http://www.ling.helsinki.fi/~aarppe
----------------------------------------------------------------------
Work: Department of General Linguistics, University of Helsinki
Work address: P.O. Box 9 (Siltavuorenpenger 20 A)
    00014 University of Helsinki, Finland
Work telephone: +358 9 19129312 (int'l) 09-19129312 (in Finland)
Work telefax: +358 9 19129307 (int'l) 09-19129307 (in Finland)
----------------------------------------------------------------------
Private address: Fleminginkatu 25 E 91, 00500 Helsinki, Finland
Private telephone: +358 50 5909015 (int'l) 050-5909015 (in Finland)
----------------------------------------------------------------------

------------------------ Yahoo! Groups Sponsor --------------------~--> 
Get fast access to your favorite Yahoo! Groups. Make Yahoo! your home page
http://us.click.yahoo.com/dpRU5A/wUILAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~-> 

 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 


More information about the Lexicography mailing list