[Lexicog] dictionary software

Richard Rhodes rrhodes at COGSCI.BERKELEY.EDU
Fri Mar 19 21:14:41 UTC 2004


Connor asked about relational databases and got several answers, not
all of which were entirely accurate. But since I actually published a
dictionary directly from a relational database (Eastern Ojibwe,
Chippewa, Ottawa Dictionary. Mouton de Gruyter 1985) -- in fact each
of the halves of the dictionary were reports (in the technical sense)
that were fed directly to a typesetter, I'd like to try to answer the
question. (A fuller discussion can be found in my artcle in Making
Dictionaries: Preserving Indigenous Languages of the Americas
[Frawley, Hill, and Munro, eds.])

A relational database consists of a set of tables. Each table
represents a particular relation in which each entry in the table is
unique. The tables are linked to one another, logically, so that the
information from various tables can be combined in unambiguous ways.

(By now the hardcore computer types are probably tearing their hair
out, but I'm trying to say this in as non-technical a way as
possible, pace the late Edgar Codd.)

So for bilingual dictionary makers, a sensible set of tables might
look like this (pardon me if I have some of the German details wrong,
my German is very rusty and I don't have a dictionary at hand):

TABLE 1 (Core relation)

key : L1 phrase : L2 phrase	1753 : get something started : etwas
in Schwung bringen
			  	2435 : start to do something :
beginnen, etwas zu machen
		         	3467 : start                 : beginnen
		      		0746 : swing                 : Schwung


TABLE 2 (L1 index)

key : L1 key word : ps		1735  : start : vt
			  	3467  : start : vt
			  	3467  : start : vi
			  	2435  : start : vi
			  	0089  : start : n
		          	4301  : swing : vt
		          	3988  : swing : vi
		          	0746  : swing : n


TABLE 3 (L2 index)

key : L2 key word : ps		1735  :  Schwung  : n m
			   	4589  :  beginnen : vt
			   	3467  :  beginnen : vt
			   	2435  :  beginnen : vi
		           	0746  :  Schwung  : n m


TABLE 4 (L1 grammar)

L1 word : ps : grammatical info
			   	start : vt : started , started
			  	start : vi : started , started
			  	swing : n  : swings


TABLE 5 (L2 grammar)

L2 word : ps : grammatical info
				Schwung : n m : no plural
				beginnen : vt : begann, begonnen
				beginnen : vi : begann, begonnen

As you notice this is very redundant. The uniqueness that Mike
Maxwell was referring to is the mathematical consistency that gives
unique answers. But these tables are connectable in unique ways to
one another via they keys. As John Koontz mentioned, each line of
each table has to have a unique key. But the key could be either a
single arbitrarily assigned value, like those in tables 1, 2, and 3.
Or they can be unique pairings like the word plus it's part of speech
in tables 4 and 5. In my dictionary work I had no arbitrary keys,
everything was done by connecting pairings (or tuples in many cases).
Then the queries are made by joining the tables together and querying
the resultant table for a particular configuration of information. In
this case you join these tables together to generate lines that look
like the following:

get something started : etwas in Schwung bringen : start : vt :
started, started
start to do something : beginnen, etwas zu machen : start : vi :
started, started
start : beginnen : start : vt : started, started
start : beginnen : start : vi : started, started.

Which when appropriately queried will generate an entry like the following:

start vi beginnen, start to do something, beginnen, etwas zu machen;
vt beginnen; get something started, etwas in Schwung bringen; past
started, part started.

A parallel set of joins and queries will produce the other half of
the dictionary.

It can be very hard to set things up properly. There's a lot to
figuring out what the relations (in a technical sense) are. There's a
whole process called normalization to do that. But it's worth doing.
When your data is normalized and entered you can't get away with
anything. The whole system forces you to a consistency never before
possible in lexicography.

BTW, for people in a Mac environment, I recommend Acius' 4-Dimension.
It has a decent interface, and it can be fast even for largish
databases, especially if you compile the programs you write to do
your dirty work.

Rich Rhodes



>Dia dhaoibh, a chairde!
>
>For those of us with no experience in the matter: what is it that makes
>relational databases so useful to lexicographic and/or other linguistic
>work?  I'm still not clear on what it is that makes a relational database
>distinct from any other kind, so a nice concrete example of
>
>(a) what makes the distinction and
>(b) how we (in my case *beginning*) lexicographers can use it to our
>benefit
>
>would be much appreciated.
>
>Sla/n,
>bhur gcara
>
>
>
>
>Yahoo! Groups Links
>
>
>
>


--
******************************************************************

Richard A. Rhodes
Associate Dean, Undergraduate Division
Interim Director, Office of Undergraduate Advising
College of Letters & Science
113 Campbell Hall
University of California
Berkeley, CA 94720-2924
Phone: (510) 643-4184
FAX: (510) 642-2372

******************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20040319/2e3b47ac/attachment.htm>


More information about the Lexicography mailing list