[Lexicog] Field research based dictionaries

Mike Maxwell maxwell at LDC.UPENN.EDU
Thu Apr 1 18:54:52 UTC 2004


Masha Brykina wrote:
> I am involved in a project aimed to make a Besermyan-Russian
> dictionary...
> we've collected over 3000 basic lexemes with some examples,
> which we have organized in a database made in Microsoft Access.
> During our work we were confronted with certain problems and
> would like to learn more about these points (perhaps you could
> provide some references):
1)      Whether anyone has experience using MS Access for such goals.

I haven't used Access for dictionaries, but the general consensus among
those who have done such things is, don't.  For starters, I have my doubts
whether there is a long term future at Microsoft for Access, as opposed to
say MS SQL Server.  A little searching at Microsoft's web site would
probably turn up the facts on this.

But perhaps more importantly, unless you're doing something really different
in your dictionary from what everyone else is doing, trying to build a
lexical database in Access is something like trying to build an automobile
from scratch.  It can be done, but why not buy an automobile?  There are a
number of tools already "out there" for doing lexical databases, and while
they may not be suitable for, say, the Oxford English Dictionary, they are
perfectly suited to smaller dictionaries (and a 3000, or even 10,000 lexeme
dictionary is comfortably small).

Other reasons for not using Access (or any other proprietary software like
that) can be found in Steven Bird and Gary Simon's article in Language 79:
557-582.  You might also visit the EMELD web site (http://emeld.org/),
particularly their recommendations on best practice(s).

(I'll defer to someone else on questions 2-3.)

4)      Whether there are any works on principles of making alphabets.
> For example, we have problems with shwa-sound. There are
> several variants of this sound, which don't seem to be phonological,
> but which are easily distinguished by the native speakers.

There's a huge amount of work on this, dating back to Ken Pike's 1947 book
"Phonemics", with its appendix (chapter? I forget) on practical
orthographies.  I find this book (or at least the appendix) surprisingly
accurate despite all the water that has gone under the bridge since then.

In this particular case, I would guess the first issue is to document the
evidence that the variants are not phonological (i.e. that they are
phonologically predictable).  A lot will depend on the conditioning
environments, e.g. whether they are completely phonologically-based, or
whether they must take into account morphological information.  The latter
sort of rules tend to make the variants much more perceptible to native
speakers.  (And of course for Pike, there was a clear-cut distinction
between the two rule types.  Most modern phonologists would disagree.)

But the overriding principle is that the native speaker is right.  Or as
I've put it when I've been involved in arguments over the "correct"
orthography, the best orthography is the one that is used.  If native
speakers insist on writing distinctions, who are we as linguists to say they
shouldn't?  One reason native speakers often write sub-phonemic distinctions
is that they have learned to read and write (or at least speak) another
language, often the dominant language of the country, and this language
makes a phonemic distinction that their own language does not--but they are
quite aware of it in their own language because of their bilingualism.

> ...And since
> Besermyan differs phonetically from Udmurt and therefore can be
> seen as having no writing system, one more problem arises - how
> to choose lexical entry in case there are several phonetic variants
> for one lexeme.

Here I guess you're saying there are dialectal differences.  Again, there's
a huge amount written on this, but I will (mostly) defer to those on this
list who are more familiar with that literature.  Of course if you can
choose a single letter to cover both pronunciations (the letter 'f', say,
for both a biliabial and a labiodental fricative), then it winds up being
just a writing system issue. But things are hardly ever that simple...

For lexicons, one approach is to use minor entries for the distinct forms
from all but one dialect.  The minor entry just refers to the major entry,
e.g.
    lorry: see truck
Of course there are difficult sociolinguistic issues here; this approach can
be seen as setting up one dialect as the preferred dialect, and the others
as less prestigious.

I guess it's also important to distinguish between what you'll do in the
dictionary, vs. what you (or others) do in texts.

    Mike Maxwell
    Linguistic Data Consortium
    maxwell at ldc.upenn.edu



------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list