[Lexicog] Theoretical constructs vs. practical reference dictionaries

Mike Maxwell maxwell at LDC.UPENN.EDU
Wed Feb 11 01:22:50 UTC 2004


Kenneth C. Hill wrote:
> Some nouns are invariant regardless of singular or multiple
> reference. As
> we found that such nouns had no plural form, we marked them as "n.sg."
> Rather than simplify the notation "n.sg." to simply "n." and let the
> absence of a cited plural form speak for itself, we have left stand
> the mark "n.sg.
>
> --- 'Lou Hohulin' <lou_hohulin at sil.org> wrote:
>> ...when it
>> comes to the practical task of producing helpful dictionaries and
>> grammars, we desperately need a theoretically sound basis for
>> deciding 'what goes where'.
>>
>> I am using the SIL-developed program, LinguaLinks, which allows me to
>> interlinearize text, and then, attested examples from the texts can
>> be seen in entries.
[etc.]

In addition to the theory vs. practical dictionary distinction, there is
another distinction that I think is useful: database vs. publication.  Or
perhaps a better term than 'database' would be 'knowledge base': a network
of information about a language (or languages, etc.), where there are links
among phonological, lexical, grammatical, and anthropological
observations--yes, even encyclopedic information.  Much of this linkage
among components was provided in LinguaLinks, and plans for SIL's FieldWorks
involve even more linkage.

For example, a noun that is singular (it's not clear to me whether that's
what Kenneth Hill's nouns above are, but I'll (mis-)use them as an example)
would be encoded in the lexical part of the database as a lexical entry
whose grammatical component points to the category 'noun' in the grammar
part of the database, and including a morphosyntactic feature '[number
singular]', which is also located in the grammar.  By following that link,
one comes to the grammatical description of nouns, number, and singulars in
the language.  From there, one can find out what other number features the
grammar uses: perhaps 'plural' and 'paucal'.  (I'm sure the latter is not
true of Hopi, but this is an illustration!)  For most such features--the
ones that the designers of FieldWorks have thought to include in the
shipping version--there will be further links to non-language-specific
information, e.g. a definition of 'paucal' in general linguistic terms.
There won't be any such link for language-specific features, such as
'tortilla-shaped' or 'rod-shaped' in Tzeltal, but there will be for lots of
more common stuff.

So Rich Rhodes should be happy with the fact that the grammar and the
lexicon are both represented in the knowledge base, with no firm dividing
line between them.  (I will, too--I didn't actually object to that idea.)

But when it comes to publishing a dictionary, you may not want to include
all this "stuff."  What you want for publication is a "view" of the
database, which is tailored to the audience.  For example, if the audience
for the dictionary is native speakers, morphosyntactic feature structures
are probably not going to be a big hit.  So a lexeme with a pointer to the
grammatical category 'noun' and the feature [number singular] will appear to
them like 'n.sg' (to use Kenneth's example again).

Again, for languages with lots of messy morphology (particularly prefixing
morphology), the database might contain roots or stems; but the user's view
might be some less abstract citation form, or (if it is an electronic
publication) a way to do parsing and lookup of fully inflected forms.

The database vs. publication distinction allows us to have our cake and eat
it too: we have the database with all the internal links, blurring any
distinction between lexicon and grammar (and your field notes, if you
choose); you can archive this, publish it for other linguists, or have it
buried with you.  We also have the publication (or publications), tailored
to the audience, which may choose to simplify, or to hide details.

Of course that doesn't solve Lou's question--she still has to decide what
information to include in the published version.  In fact, if she decides to
publish multiple versions, it makes it worse: she has to decide multiple
times!  But she can always change her mind, re-run the program, and get the
new version.

And it doesn't solve Kenneth's dilemna about how to display these nouns
without plural forms.  But so long as he's somehow indicated in the database
that they lack plurals, he can represent that fact in the published version
any way he chooses--by calling their POS 'n.sg', by having the words "(no
plural)" appear automatically, etc.

So enjoy your cakes!

Oh, BTW--the tight linkage between grammar and lexicon in LinguaLinks was
sometimes seen as a drawback ("monolithic" was the derogatory term).  But I
like to think of it as a feature, not a bug.

    Mike Maxwell
    Linguistic Data Consortium
    maxwell at ldc.upenn.edu




Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list