[Lexicog] polysynthetic languages and dictionaries
William J Poser
billposer at ALUM.MIT.EDU
Wed Jun 2 00:36:08 UTC 2004
The reason for having the dictionary emulate the rules of the
grammar is precisely because they aren't transparent to the
dictionary user. Suppose you're a second language learner
who doesn't yet have a good analytic knowledge of the language
and you encounter a big hairy verb form. If the dictionary only
lists infinitives or root or a selected citation form of some sort,
you may be quite unable to figure out what to look that verb up
under. Suppose, for instance, you encounter the Carrier form
natisdalh, which means "I'm going to walk back', e.g. "I'm going to
go home". A root dictionary would list this under ya "for one person
to walk on a single pair of limbs", as in nusya "I am walking around".
To be able to look at natisdalh and realize that you need to look it up
under ya, you need to be able to pick it apart into na "back",
t "inchoative", i "future", s "first person singular subject",
d "valence prefix that usually accompanies 'back', among other morphemes",
and know that certain /y/-initial stems delete the /y/ when preceded
by the d "valance prefix". You also have to recognize the final /lh/
as a future affirmative marker. So, if you know all this stuff
you can figure out how to look up the root or other pieces of the
verb and thus in principle can figure out what it means.
If you have a computer program that "knows" this, you don't have to.
You enter natisdalh and the computer figures out that it is a form
of "walk" etc. If you don't know this and don't have such a computer
program, you are stuck. How are you going to find the right
information in the dictionary?
I know of just two other approaches. One is to list every form in the
dictionary. In that case, you just look up natisdalh and if you're
lucky and that form is in the dictionary, you're fine. The problem
is that at least in print you can't afford the space to include
very form of every verb in languages with lots of forms, and even
if you could, as you arguably can if the dictionary is electronic,
entering all of them would be very tedious and error-prone.
The other approach is to choose a particular fully inflected form
as the citation form. This is the approach of the Young and Morgan
Navajo dictionary. The problem with this is that you need a lot
of knowledge about the morphology to get from the form you want
to look up to the citation form. It has the virtue of being a little
more concrete, which some users prefer, but it doesn't really
solve the problem of requiring a lot of knowledge on the part of
the user. Dine College has a semester long course for native speakers
of Navajo that it is not too much of an exageration to describe
as a course on how to look things up in Young and Morgan.
I don't know much about Nez Perce, but if the complexity is more
in "derivational" than "inflectional" morphology, that may or
may not make a difference. If it isn't too complex, and if the
inflectional stuff is separable from the derivational stuff,
then you might be able to use derivational stems as headwords.
Then people would just have to learn to strip off the inflectional
stuff, and you could give the precise meaning of each derivational
string. This is kind of like the situation in Turkish. Verbs
can be quite long and complicated, but they are exclusively suffixing
and quite regular, agglutinating rather than fusional, so it
isn't too hard for people to learn to chop off the inflectional suffixes
and look up the infinitive. The reason that dictionary lookup in
Athabaskan languages like Carrier is such a problem is that the
inflectional and derivational stuff are mixed up. Roughly speaking,
you have a stem at the end, preceded by inflectional stuff like
subject, tense, aspect, and object, which is then preceded by
derivational stuff. Some prefixes that occur far to the left
obligatorily co-occur with certain stems and have no meaning of
their own. And some categories require prefixes in both regions.
For example, in Stuart/Trembleur Lake Carrier, we have
yalhtuzisduk for "I am not going to speak". The analysis is:
ya-lh-t-z-i-s-duk
YA-Neg-inchoative-Neg-Fut-1ssubj-speak
There are TWO negative prefixes, and the prefix /ya/ at the
beginning has no meaning of its own. "to speak" is a discontinuous
morpheme consisting of the prefix /ya/ plus the stem /duk/.
So it is hard to teach people to chop off certain pieces as you
can in Turkish.
Bill
--
Bill Poser, Linguistics, University of Pennsylvania
http://www.ling.upenn.edu/~wjposer/ billposer at alum.mit.edu
------------------------ Yahoo! Groups Sponsor --------------------~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wmxD/DREIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list