[Lexicog] deciding on the citation form

Mike Maxwell maxwell at LDC.UPENN.EDU
Sun Feb 25 03:58:23 UTC 2007


Piotr Banski wrote:
> I was wondering if you can point me to a book or a paper that discusses
> the rationale behind the lexicographer's choice of the citation form for
> the given lexeme class of the given language. 

I keep referring to this book

Bartholomew, Doris A. and Louise C. Schoenhals. 1983. Bilingual 
dictionaries for indigenous languages. Mexico: Summer Institute of 
Linguistics.

which has a good discussion on the selection of citation forms.  I'm 
sure there are other, more recent (and in-print) discussions--surely 
someone on this list knows?

> For example, with verbs you usually go for the infinitive
 > ...
 > Similarly with nouns, where you probably usually want to go for NomSg

For all POSs, you generally want to go with the form which will most 
easily allow the reader to determine the other forms.  For nouns in 
languages that have nominative-accusative case marking systems, this is 
often the nominative form, since that form is often the least marked 
form, i.e. it most closely resembles the stem (since for many languages 
with this type of case marking, the nominative affix is null).  Of 
course there are other sorts of case marking systems, as well as 
languages where there is no case marking.

Another issue is with obligatorily possessed nouns (frequently the case 
with body parts).  Again, one attempts to choose the least marked form, 
if such exists.

For verbs, infinitives are the usual choice with many (all?) Romance 
languages, but that is not necessarily a good choice for other 
languages-- and as you remark, many languages do not have an infinitive. 
  A third person singular present tense is often a good choice, since 
again this tends to be the least marked form.  A first person singular, 
even if it were unmarked, would often be a poor choice, because many 
verbs don't have such a form (like the verb 'rain' in most languages).

> but then there's Latin, where (I'm not sure) you use {NomSg, GenSg}
> pairs -- is that correct? Or do you just go for NomSg, treating the
> GenSg ending as the first bit of grammatical information? (And is there
> any practical difference between these approaches apart from having to
> allow for a larger number of homonyms on the latter?)

I don't know how it's done in Latin dictionaries, but a common reason 
for having more than one principle part is stem allomorphy.  For 
example, in Spanish some verbs are more or less irregular, in the sense 
that the stem has an unpredictable allomorph.  So for a verb like 
'tener' "to have", one approach is to list the stem allomorphs, either 
as individual lexical entries (alphabetized separately), or as 
subentries.  One doesn't normally list just the bare stem, instead one 
uses wordforms for particular parts of the paradigm where that stem 
allomorph appears (such as tengo, tiene, tuvo, tendra--the latter with 
an accent that I can't easily show here).  (The alternative, used by 
dictionaries like the University of Chicago Spanish-English dictionary, 
is to define a bunch of "paradigms" that illustrate the stem allomorphs, 
and to tell the user that a given verb conjugates like one of those 
"paradigms".  I have the scare quotes around "paradigm" because the 
traditional notion of paradigm does not include stem allomorphy.)

Another reason for having principle parts is when the chosen citation 
form does not sufficiently identify the paradigm or declension (in the 
traditional sense, i.e. the set of affixes the word takes).  Ideally, 
you choose a citation form so it does identify the paradigm, but this 
often conflicts with the criterion of choosing the least marked form.

> Then there's Hebrew (and, I guess, Arabic, Amharic, possibly Semitic in
> general (?)), where I don't know what happens. Just consonantal roots?
> That would mean rather complicated entries. 

Traditionally, Arabic (and maybe Hebrew, I don't know) dictionaries are 
alphabetized by the consonantal root, and the main entry for such a root 
has subentries for the various stems.  The alternative arrangement in 
some modern print dictionaries is to list the stems as citation forms of 
dictionary entries, with a cross-reference to the root's entry.

I might add that in SIL's FLEx, the lexicon allows you to avoid making 
the distinctions between root-based and stem-based dictionaries until it 
comes time to create a print version of the dictionary.  Likewise, you 
can avoid deciding whether any additional principle parts you may need 
should be alphabetized separately, or included as subentries of the main 
entry (or both).

> for polysynthetic languages, where my imagination simply fails.

If you're using 'polysynthesis' here to mean the incorporation of nouns 
into verbs (like English 'babysit'), then it depends on how productive 
incorporation is.  If it's fully productive, you needn't list each 
incorporated form; but if it's only partially productive, like Inuit 
(Eskimo), then I suppose you'd list the forms that actually exist.

If OTOH you are using 'polysynthesis' to mean extremely agglutinating 
languages, like Athabaskan languages, then perhaps someone else can 
answer.  My understanding is that nothing works very well; Bill Poser 
has told me that there is a college course for native speakers of Navajo 
to teach them how to use their language's dictionary.  If it's that bad, 
I guess it's fair to say there is no good way to make a print dictionary 
of such a language.  Fortunately, there are electronic alternatives 
these days, provided someone implements a morphological parser for the 
language.

Another language family where the need to list multiple forms of a word 
arises, for slightly different reasons, is the Philippine languages, 
such as Tagalog.
-- 
	Mike Maxwell
	maxwell at ldc.upenn.edu


------------------------ Yahoo! Groups Sponsor --------------------~--> 
Something is new at Yahoo! Groups.  Check out the enhanced email design.
http://us.click.yahoo.com/kOt0.A/gOaOAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~-> 

 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list