[Lexicog] deciding on the citation form
Mike Maxwell
maxwell at LDC.UPENN.EDU
Sun Feb 25 03:58:23 UTC 2007
Piotr Banski wrote:
> I was wondering if you can point me to a book or a paper that discusses
> the rationale behind the lexicographer's choice of the citation form for
> the given lexeme class of the given language.
I keep referring to this book
Bartholomew, Doris A. and Louise C. Schoenhals. 1983. Bilingual
dictionaries for indigenous languages. Mexico: Summer Institute of
Linguistics.
which has a good discussion on the selection of citation forms. I'm
sure there are other, more recent (and in-print) discussions--surely
someone on this list knows?
> For example, with verbs you usually go for the infinitive
> ...
> Similarly with nouns, where you probably usually want to go for NomSg
For all POSs, you generally want to go with the form which will most
easily allow the reader to determine the other forms. For nouns in
languages that have nominative-accusative case marking systems, this is
often the nominative form, since that form is often the least marked
form, i.e. it most closely resembles the stem (since for many languages
with this type of case marking, the nominative affix is null). Of
course there are other sorts of case marking systems, as well as
languages where there is no case marking.
Another issue is with obligatorily possessed nouns (frequently the case
with body parts). Again, one attempts to choose the least marked form,
if such exists.
For verbs, infinitives are the usual choice with many (all?) Romance
languages, but that is not necessarily a good choice for other
languages-- and as you remark, many languages do not have an infinitive.
A third person singular present tense is often a good choice, since
again this tends to be the least marked form. A first person singular,
even if it were unmarked, would often be a poor choice, because many
verbs don't have such a form (like the verb 'rain' in most languages).
> but then there's Latin, where (I'm not sure) you use {NomSg, GenSg}
> pairs -- is that correct? Or do you just go for NomSg, treating the
> GenSg ending as the first bit of grammatical information? (And is there
> any practical difference between these approaches apart from having to
> allow for a larger number of homonyms on the latter?)
I don't know how it's done in Latin dictionaries, but a common reason
for having more than one principle part is stem allomorphy. For
example, in Spanish some verbs are more or less irregular, in the sense
that the stem has an unpredictable allomorph. So for a verb like
'tener' "to have", one approach is to list the stem allomorphs, either
as individual lexical entries (alphabetized separately), or as
subentries. One doesn't normally list just the bare stem, instead one
uses wordforms for particular parts of the paradigm where that stem
allomorph appears (such as tengo, tiene, tuvo, tendra--the latter with
an accent that I can't easily show here). (The alternative, used by
dictionaries like the University of Chicago Spanish-English dictionary,
is to define a bunch of "paradigms" that illustrate the stem allomorphs,
and to tell the user that a given verb conjugates like one of those
"paradigms". I have the scare quotes around "paradigm" because the
traditional notion of paradigm does not include stem allomorphy.)
Another reason for having principle parts is when the chosen citation
form does not sufficiently identify the paradigm or declension (in the
traditional sense, i.e. the set of affixes the word takes). Ideally,
you choose a citation form so it does identify the paradigm, but this
often conflicts with the criterion of choosing the least marked form.
> Then there's Hebrew (and, I guess, Arabic, Amharic, possibly Semitic in
> general (?)), where I don't know what happens. Just consonantal roots?
> That would mean rather complicated entries.
Traditionally, Arabic (and maybe Hebrew, I don't know) dictionaries are
alphabetized by the consonantal root, and the main entry for such a root
has subentries for the various stems. The alternative arrangement in
some modern print dictionaries is to list the stems as citation forms of
dictionary entries, with a cross-reference to the root's entry.
I might add that in SIL's FLEx, the lexicon allows you to avoid making
the distinctions between root-based and stem-based dictionaries until it
comes time to create a print version of the dictionary. Likewise, you
can avoid deciding whether any additional principle parts you may need
should be alphabetized separately, or included as subentries of the main
entry (or both).
> for polysynthetic languages, where my imagination simply fails.
If you're using 'polysynthesis' here to mean the incorporation of nouns
into verbs (like English 'babysit'), then it depends on how productive
incorporation is. If it's fully productive, you needn't list each
incorporated form; but if it's only partially productive, like Inuit
(Eskimo), then I suppose you'd list the forms that actually exist.
If OTOH you are using 'polysynthesis' to mean extremely agglutinating
languages, like Athabaskan languages, then perhaps someone else can
answer. My understanding is that nothing works very well; Bill Poser
has told me that there is a college course for native speakers of Navajo
to teach them how to use their language's dictionary. If it's that bad,
I guess it's fair to say there is no good way to make a print dictionary
of such a language. Fortunately, there are electronic alternatives
these days, provided someone implements a morphological parser for the
language.
Another language family where the need to list multiple forms of a word
arises, for slightly different reasons, is the Philippine languages,
such as Tagalog.
--
Mike Maxwell
maxwell at ldc.upenn.edu
------------------------ Yahoo! Groups Sponsor --------------------~-->
Something is new at Yahoo! Groups. Check out the enhanced email design.
http://us.click.yahoo.com/kOt0.A/gOaOAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> Your email settings:
Individual Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/lexicographylist/join
(Yahoo! ID required)
<*> To change settings via email:
mailto:lexicographylist-digest at yahoogroups.com
mailto:lexicographylist-fullfeatured at yahoogroups.com
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list