[Lexicog] polysynthetic languages and dictionaries
William J Poser
billposer at ALUM.MIT.EDU
Thu May 27 21:11:46 UTC 2004
>I am under the impression that the particular strategies proposed by
>the human language technology camp are not in anyway related to the
>intralinguistic system of polysynthesis. rather the "rules" or parsing
>strategies that are applied towards generating surface forms or what
>have you are a consequence of the programming language itself and have
>very little to do with polysynthetic systems.
This is a mistaken impression. What we're talking about is the idea
that an electronic dictionary that is able to emulate the rules
by which we produce and understand the very complex words of
polysynthetic languages provides a solution to the problem of looking
up such words. In a language like English, it is possible to list
verbs under their infinitive, for example, because English verbs don't have
very many forms and are put together in a fairly simple way, so it
doesn't take a lot of effort or training on the part of the user to
figure out how to look up an inflected verb form. For instance, although
"parses" is not listed, it isn't very hard to learn to look it up under
"parse". But in a language in which verbs have large numbers of forms
that are not put together in a simple way, it is problematic to figure
out what to look up, and you can't list every possible form because
there are too many. A morphological parser, however, can analyze a
fully inflected verb form, figure out where to look it up, and return
to the user both the basic dictionary entry and information about the
whole complex form.
Far from this being something that flows from programming languages
rather than from human language, it is quite the opposite. The
structure of complex words and the rules that govern it is something
that is part of language and we are talking about how computers can
be used to deal with this. Indeed, part of the discussion that has gone
on here is about the fact that some tools for doing morphology
on a computer are better adapted to the way human language works
than others. One person commented that so-called Two Level morphology,
as used for example by Kimmo Koskenniemi, is computationally very
efficient but not very well adapted to human languages. Mike Maxwell
replied that not all finite state tools are Two Level and in particular
pointed out that Xerox's xfst program is much better adapted.
Questions of efficiency and storage do of course come up, just
as there are practical concerns with anything you do, but they
have come up in the context of asking whether it is currently
practical to use this approach to dealing with the dictionary
lookup problem. In sum, we're starting from the nature of the languages
and asking how we can solve the problems that it poses for
dictionary lookup.
Bill
--
Bill Poser, Linguistics, University of Pennsylvania
http://www.ling.upenn.edu/~wjposer/ billposer at alum.mit.edu
------------------------ Yahoo! Groups Sponsor --------------------~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wmxD/DREIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list