[Lexicog] introduction and too many issues

Wayne Leman lexicography2004 at YAHOO.COM
Thu Jan 15 15:26:10 UTC 2004


--- In lexicographylist at yahoogroups.com, "Ron Moe" <ron_moe at s...>
wrote:
The question of organization of a dictionary is dominated by one over-
riding
fact--words are multifaceted, having a phonological representation, a
phonetic pronunciation, a derivational structure, an affixation
potential, a
syntactic distribution pattern, a meaning, links to other words
through
lexical relations, inclusion in sets of words such as semantic
domains, and
an etymological history. Although the tradition has been to organize a
dictionary alphabetically according to phonological representation,
this is
just one possible organization. It is popular because it is
relatively easy
to learn the alphabet in order to find words, and because the
phonological
shape is fairly stable and high in the level of awareness of
unsophisticated
speakers. However, except in the cases of a non-phonemic orthography
where
people need to find the correct spelling of a word, the phonemic
shape is
one of the most uninteresting facets of a word. Most people are far
more
interested in the meaning or in finding other members of the lexical
set to
which the word belongs. Hence the popularity of the thesaurus.

Fortunately we have the means of satisfying the needs of most users
of a
dictionary by publishing electronically. Unfortunately, most users
want a
hard copy. Publishing in hard copy forces us to choose one and only
one
format, unless we are willing to spend and possibly lose lots of
money. If
we publish electronically, we need a format (preferably standardized)
for
the data, and a computer program to make the data accessible to the
user.
The computer program would have to be extremely user friendly, yet
powerful
enough to enable sophisticated users to search, sort, and display the
data
in multiple views. A standardized format would also enable the user to
export the data to specialized applications. Did the EMELD conference
mentioned below suggest such a standard, or do we need to develop
one? Has
anyone written a computer program that would enable an electronically
published dictionary to be used by the general public?

Much has been written about publishing in hard copy. The main points
are as
follows: (1) Each language family has developed certain traditions,
usually
due to typological features. For instance in most Bantu dictionaries
the
nouns are alphabetized by the initial class prefix. Verbs are
alphabetized
by the stem, ignoring the prefix. The prefix is sometimes printed in
italics, since the stem is not a naturally occurring word. Adjectives
are
also alphabetized by the stem with an initial hyphen in indicate the
missing
concordial prefix. The system works, but is cumbersome and results in
some
oddities. So the lexicographer must investigate the tradition and
decide if
he will follow it.

(2) If the lexicographer alphabetizes the dictionary, but also wants
to
include other organizations, he can either utilize appendixes or boxes
(Jeff's 'islands' in the message below). Most bilingual dictionaries
include
a finder list as an appendix. The Longman Language Activator took an
innovative approach by alphabetizing all the words as minor entries,
which
refer the user to semantic domain islands where the words are
described. The
minor entries and domains are all alphabetized in a single listing.
Ken
Smith's Sedang dictionary is more traditional, but includes selected
domains
in boxes. Other dictionaries include selected domains in an appendix.
Any
special organization or set of words, such as proper nouns, can be
included
in an appendix, but doing so increases the size and cost of the book.
Islands can take the form of actual boxes set into the text, tables
within
an entry, special sections at the end of an entry, subentries under a
main
entry, or special entries set off by special type as in the Longman
Language
Activator. One way of handling derivatives is to list them at the end
of an
entry. You can either include all derivatives or only those with
unpredictable meaning. You can organize them in a hierarchy of
derivation or
list them alphabetically. You can list them on the margin or in
paragraph
format. You can simply list them or supply additional information
such as
part of speech and a short definition.

(3) A dictionary can be organized by root or stem. Each has its
advantages
and disadvantages. The primary disadvantage of a root dictionary is
that
unsophisticated users have great difficulty in stripping off affixes
and
identifying the root. If there is any kind of allomorphy, the task can
become nearly impossible, even for linguists. One solution is to
include
every derivation as a minor entry, but this also adds to the size and
cost
of the book. Conversely you can organize by stem and indicate the
root of
every derivative. Then you can include each root and its derivatives
in
boxes or in an appendix. A language with inflectional prefixes has
the same
sort of problems as a root dictionary.

(4) The needs and limitations of the intended audience must be kept
in mind.
Most users will never read the introduction or receive any training
in using
the dictionary. So the organization must be transparent and easily
mastered.
The user must be able to find the information he is seeking quickly
and
easily. For instance the Oxford Learner's Wordfinder Dictionary is
organized
by semantic domain. All words are described under the relevant domain.
However there is no finder list and very few minor entries. So it can
be
quite frustrating trying to find a word. The user must be familiar
with the
list of domains and be able to guess what domain any given word is
likely to
belong to. (On the positive side it is a great language learning
tool.) The
needs of the user are a little hard to ascertain, since people use
dictionaries for so many different reasons. However you can usually
determine the primary needs of the target audience and design your
dictionary accordingly.

Obviously your options are mind boggling. If you publish
electronically, you
can have it all. But first we have to write the program that will do
it all
for us. Then we have to get our data into the proper format and
upload it to
a web site that has the program online. I'm looking forward to the
day when
all 6000 languages are accessible in this way.

One final note. Jeff's question about putting hyphens at morpheme
breaks is
a nice example of the difference between electronic and hard copy
publishing. If you publish electronically, it is a simple matter of
correctly formatting the data. I would suggest having one field
contain the
head word without hyphens and another with hyphens. But if you
publish on
paper you have to choose one or the other. Otherwise you substantially
increase the size of the dictionary. Adding the hyphens increases the
size a
little bit. You indicated that there were consequences for the
orthography--phonemic vs. phonetic. I've been working for many years
to add
a morphemic breakdown to a Koine Greek dictionary and have found
numerous
problems with allomorphy. I've had to add numerous fields in order to
capture surface vs. underlying forms, and to get the data to sort in
useful
ways. Providing complex data analysis of this sort is possible
electronically, but is impossible in hard copy. I've written at length
because I'm advocating electronic publishing in lieu of or in
addition to
hard copy publishing.

Ron Moe

-----Original Message-----
From: yahganlang [mailto:phonosemantics at e...]
Sent: Sunday, January 04, 2004 10:34 AM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] introduction and too many issues


Hi. I'm Jess Tauber- many of you may know me from my forays into
sound symbolism.

For the past few years I've been active in collecting and re-editing
materials relating to the Yahgan language (also variously spelled
Yagan, Yaghan, Jagan, Iakan, etc.) of Tierra del Fuego, which is at
face a genetic isolate, and has only one speaker left (though there
are tantalizing reports now of a second).

The dictionary of the language, compiled in the late 19th century by
the Rev. Thomas Bridges, has been a problem child, to say the least.
A final version, completed in the mid 1880's, contained around 32000
headwords, but has since been lost (though I hope somebody has it in
their collection somewhere). An earlier draft, of @23000 headwords,
was edited and published in Austria in 1933.

The editors made quite a mess of the resulting document. Due to fund
limitations during the Depression, they decided to compact the
dictionary by massive use of abbreviations, consolidation of separate
definitions, and use of the word "ditto". They give a garbled
description of the proper "unzipping" process in the introduction,
which even if followed perfectly makes for very tedious utilization
of this resource.

However, massive numbers of printer errors make the reconstitutional
formula unreliable, as I found out again and again. Luckily the heirs
of the Rev. Bridges have been very kind in sending me a xerographic
copy of the original manuscript set (though the last section is
missing, having been lost by the editors during WWII- this section
will always have question marks attached to it).

The handwriting of the ms is very often quite hard to read, due to
Bridges' own paper-saving habit of cramming new heads and definitions
between old (which I'm sure was rectified in the final version, but
what can one do?). I've often needed my reconstituted version of the
published version to figure out what Bridges said in the ms. The ms.,
on the other hand, shows all the errors of the published 1933 volume.
Some are just laughable, others make me want to cry.

Both Bridges and the editors used their own idiosyncratic spelling
systems (though the Anthropos system of the latter was pretty well
known early in the last century). Converting all this to modern
phonemic or phonetic rendering is relatively straightforward. And
every other worker has had his or her own particular system that
needs reworking as well. The new standard spelling conventions for
the language fail to capture the detail of those from the 19th C.,
and add phonemicizations where they may or may not necessarily be
justified. The issue here is whether to convert to the new standard
or to argue for a newer one based on the historical facts of the
language.

The alphabetical order of the dictionary (which was preserved in the
published version) is also unusual, based on the order of the Ellis
phonetic script which Bridges later modified for Yahgan- should this
be left as is, or redone? The order puts all the vowel-initial forms
first before those with initial consonants, and there is a kind of
psychological unity in this mode of presentation.

I've spent a great deal of time analyzing headwords into their
respective morphemes, and am satisfied I've pretty much got them all
correct. Should the heads be presented as unanalyzed strings, or with
hyphens between morphemes? If the latter, then this affects the
choice of a phonemic versus a phonetic spelling.

Most of the 23000 headwords are in fact derivations. Bridges claimed
he only includes forms when the semantics are not predictable from
the parts, but this is just not true, for most of them. Should
derivations with predictable meanings be left out? Or should they be
subsumed under the basic root head? Which brings up another related
issue. I've had a strong desire to organize the entire reworked
dictionary by roots and affixes- which is great for linguists, but
maybe not so good for any future speakers, if the language can be
saved.

One possible solution is what Anthony Mattina did for his Colville-
Okanagan dictionary. He has an ocean of alphabetically ordered
headwords (most derived) with islands, also alphabetically ordered,
organized by root, with derivational subheadings, and many textual
examples as well. Some might find this mode of organization somewhat
cluttered, but at least you can find things very quickly.

Finally, what about organizing by semantic area, as I've seen done
for "classified word lists" for some Salish languages? For a smaller
vocabulary summary, would this be a good presentational mode,
especially for teaching purposes? Some of the dictionaries I've seen
include such a summary as an appendix.

All of the basic work on this lexicon is now nearly done- and soon
there will be a web site at Dartmouth for the language. I'd like
there to be an interactive dictionary program, and this is one of the
things I'd like to discuss with folks. I did attend the EMELD
conference in Michigan this past summer, to get an idea of "best
practice" at all levels of collection to presentation and archiving
of materials.

Anyway, that pretty well does it as a longish intro - comments of
course welcome.





Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/lexicographylist/

To unsubscribe from this group, send an email to:
 lexicographylist-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/



SMS 8
--- End forwarded message ---



More information about the Lexicography mailing list