[Lexicog] polysynthetic languages and dictionaries
Mike Maxwell
maxwell at LDC.UPENN.EDU
Thu May 27 15:00:22 UTC 2004
Wayne Leman wrote:
>UNLESS, of course, we use some fuzzy logic, or spelled-something-like
>programming, and/or programming code similar to what is in some
>e-dictionaries for English and other major languages where the user simply
>*begins* typing in the word desired and the program starts displaying all
>possible spellings as soon as, say, the user has typed in five letters.
>
In theory, this would be possible using the Xerox tools (see Bill
Poser's and my earlier emails). What one would do is to dump all
possible wordforms to a file, then load them into a letter trie (a
common computing data structure). Lookup would then be possible (and
fast) beginning with the first letter of the typed-in word.
Unfortunately, the Xerox tools as implemented enforce a limit on how
many forms can be dumped at once. I'm not sure exactly what the limit
is, but I suspect it's a few hundred forms. So in practice, you can't
do what I describe above. (And of course for an agglutinative or
polysynthetic language, or a language with compounding, this could well
be impractical anyway, as Antti Arppe discusses in his msg.)
If "your" language is not quite so morphologically complex, it would be
possible to create a list of all the wordforms generatable from your
dictionary, by cycling over all the possible combinations of stems and
affixes, and then applying the phonological (or graphemic) rules to the
output. This would require some fairly sophisticated programming (e.g.
you have to implement blocking to avoid generating incorrect regular
forms where irregular forms exist), but is in principle doable (again, I
emphasize, given the right kind of morphology!).
Bill mentions this approach in another of his msgs, and touches on the
need for compression. Tries constitute one form of compression, and
finite state tools generally implement some other form of compression.
There was an article on this in the journal of Computational Linguistics
a couple years ago. Unfortunately I can't look it up right now, but if
someone really wants to know, remind me after 1 June (when I get back).
Mike Maxwell
------------------------ Yahoo! Groups Sponsor --------------------~-->
Make a clean sweep of pop-up ads. Yahoo! Companion Toolbar.
Now with Pop-Up Blocker. Get it for free!
http://us.click.yahoo.com/L5YrjA/eSIIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list