[Lexicog] database structure (was Digest Number 343)
Ron Moe
ron_moe at SIL.ORG
Wed May 18 19:04:33 UTC 2005
> Claire Bowern wrote:
>> Coming up with a good database structure early on is really important
>> (I've learnt this the hard way, several times actually!)
> Mike Maxwell wrote:
> I can't speak for others on this list, but personally I'd like to hear
> more about that.
John Roberts wrote"
> Well, one of the main issues on structuring your database is whether you
>decide to have a form-based dictionary or a meaning-based dictionary.
One of the issues we've struggled with as we've designed the FieldWorks
software is the nature and relationship of major entries, minor entries and
subentries. Each corresponds to lexemes, variants, and complex forms,
respectively, and it is tempting to see them as the same thing. But a
complex form like 'houseboat' can be printed as a major entry or a subentry
under 'house' or 'boat'. This shows that 'complex form' and 'subentry' are
not the same thing. In the same way an irregularly inflected form like
'went' can show up in several places in a dictionary:
go, goes, going, gone, went v. To move... [We can list it in a paradigm
field.]
went, see go. [We can present it as a simple minor entry.]
went v. Irregular [suppletive] past tense of 'go'. "Did John go to the
store? No, he went to a friend's." ety: originally the past tense of 'wend',
PIE *wendh-. [We can present it as a full major entry with part of speech,
definition, example sentence, and etymology.]
Some dictionaries also have minor subentries of complex forms:
placement n. 1. The act of placing or arranging. 2. The act or business of
finding jobs, lodgings, or other positions for applicants. --displacement n.
Here 'displacement' is a minor entry because it is small and is a
cross-reference to a major entry. It is *also* a subentry because it is
placed at the end of a major entry. It is a complex form subordinated to
another complex form. It is subordinated here but also appears as a major
entry alphabetized under 'D'.
The difficulty comes in understanding the relationship between various kinds
of words--root, stem, derivative, compound, phrase, inflected form,
irregularly inflected form, dialectal variant, register variant, spelling
variant, a single form with more than one meaning, a single form that
belongs to more than one part of speech, etc. Somehow we feel that all these
are "the same word," or "forms of the same word." When you structure a
database, you have to decide how these kinds of words are to be related. In
FieldWorks we have a list of "wordforms"--a list of every word that occurs
in our text corpus. We also have a list of lexical entries in the
dictionary. We then have to decide how to link a word in the list of
wordforms to an entry in the dictionary. So what do we do with a wordform
like 'went'? Do we treat it like 'go' as a lexeme so that we can produce a
major entry for it? Alternatively do we make it subordinate to 'go' as an
inflected form, but permit the user to create an ad hoc minor or major entry
for it? Is 'houseboat' inherently subordinate to 'house' or 'boat' or both,
or is it a lexeme in its own right with only a logical link to the two
roots? We may be able to provide a definitive answer for 'went' and
'houseboat', but there are lots and lots of different kinds of relationships
between words. Trying to design a program to handle every possible kind
becomes rather daunting.
The problem is compounded by practical issues. Theoretically we may want to
treat every wordform as inherently equal to all others. But practical
dictionaries group inflected forms into a single entry. The fieldworker just
starting out on a language may not know that 'went' is the past tense of
'go', that 'on the house' is an idiom, or that 'unsung' is related to
'sing'. We need to make it easy for the user to enter all these forms and
gradually improve his analysis and treatment of them. We also need to make
it easy to describe irregular forms, variants, and complex forms, relate
them to their base forms, and print them in various ways. And last but not
least, we have to teach the user the theory behind all this, lexicographic
practice, and how the computer program works.
Ron Moe
------------------------ Yahoo! Groups Sponsor --------------------~-->
Has someone you know been affected by illness or disease?
Network for Good is THE place to support health awareness efforts!
http://us.click.yahoo.com/RzSHvD/UOnJAA/79vVAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list