[Lexicog] dictionary software

Fri Mar 19 04:59:35 UTC 2004

On Thu, 18 Mar 2004, Mike Maxwell wrote:
> Technically, LL was an object oriented database, which is not quite the same
> thing as a relational database.  The two are similar with respect to the
> notion that "one piece of data is represented only once."  (The technical
> term is "normalization".)

Precisely.  A relational database is one that follows the relational
model, which assumes that all data can be represented by mathematical
relationships from one domain into another.  The idealized scheme for
representing such a relationship is a t-table representing the function
from a set of unique independent values (the key) to a corresponding set
of (possibly non-unique) dependent values.  See
http://en.wikipedia.org/wiki/Relational_database.

I think, however, that in practice people mean by "relational database" a
collection of data represented conceptually by a set of one or more
multi-column tables, usually just one such table, with the rows sorted on
some key field, and the keys not necessarily unique, though due to the
sorting all identical keys will be adjacent.  In other words, a tabularly
conceived set of identically organized records (or tuples) of fields.

In classical commercial applications the set of fields in each tuple is
identical, and particular fields are also fixed in length.  Linguistic
databases often allow missing fields, repeated fields, repeated sequencs
of fields, and even some variability in ordering of fields.  The structure
of a record is at best defined by a sort of regular expression or pattern.
Fields are also usually variable length strings rather than fixed length
strings or fixed precision numbers.  In short, a linguistic database is
more like a word processor document - a list of somewhat standardized
paragraphs in which there is a key field followed by a list of named
subparagraphs.  Sometimes these are (or were) referred to as "textbases."

You can say "conceptually represented as a table," because while the
tabular logic is supported for the user, the actual implementation might
be quite different.  Each field of each table might be a mass storage file
of some sort, for example.  If the fields are fixed in length, generally
they will consist of a series of records containing one field each.  The
order of the records matches the order of the records in the key file.
There might be an associated inverse index file sorted by the values of
the field and having attached to each field the index of the corresponding
key, to facilitate lookups on non-keys.  If the fields are variable in
length various expedients such as indexes of pointers and "hashing" will
be used to facilitate manipulation.

These elaborate approaches are used in highly efficient commercial
products.  Something like Shoebox just represents a database as a text
file in which each field is a line beginning with a field label.  This is
quite workable because Shoebox is primarily a tool for entering a single
data table and doing certain simple patterns of retrieval, mostly for
editing and formatting, sometimes for glossing interlinear text.  It's not
designed to do something like efficiently retrieve the morphosyntactic
formula field of all verbs whose citation form matches a citation form in
a table mapping citation forms to surface forms found in a certain
document.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/