Action Items from LSA Endangered Lg. Mtg. (Summary)

Mark P. Line mline at ix.netcom.com
Wed Jan 29 23:34:39 UTC 1997


Tony Woodbury wrote:
>
> Johanna Nichols wrote:
>
> >Given all the work that's been done on English and its long
> >lexicographic history, surely there must exist somewhere in the
> >electronic public domain an up-to-date sophisticated canned generic
> >X-English, English-X lexical skeleton that I can graft onto and adjust
> >to my Ingush data [...]
>
> Perhaps anyone with information on this could post it to the
> Endangered-Languages-L list so that it can be shared widely and easily.

I think you could do worse than building your bilingual dictionary against
WordNet, an online (and downloadable) dictionary of English which also
includes a quite highly differentiated thesaurus of the readings (or
word-senses, called synonym sets, or 'synsets', in WordNet) expressible by
the lexical items listed. The database contains over 90,000 synsets, so
English lexical territory is pretty well covered. The WordNet home page is

http://www.cogsci.princeton.edu/~wn/

and it contains links to all the various online front-ends to the database
that people have built, and to the FTP site where you can download the
database itself.

I think I would probably try to map my Ingush (or whatever) lexis onto the
synsets (not directly onto English lexemes), since it is there that the
English terms have already been disambiguated (the word 'nutcracker' is
broken down into several synsets, for instance -- one of them is a
mechanical device for cracking nuts, and the others are kinds of birds).
Prose definitions are provided for almost all the synsets, and you could
use these in your dictionary, too -- either in English, or translated into
Ingush.

Once your Ingush lexical items have been mapped onto one or more synsets
(and perhaps creating your own new synsets for lexical territory commonly
covered by Ingush speakers but less so in English), it's then a no-brainer
to go from the synsets to the English lexical item(s) that corresponds(0)
to them. Although I haven't used WordNet in precisely this way yet, I
think it's an excellent foundation for any kind of bilingual lexicography
where one of the languages is English.

If anybody needs help in manhandling the WordNet files into some more
useable format, let me know. In fact, I've already created flat-files of
WordNet synsets for a different project (unrelated to endangered
languages), which you can download from

       ftp://ftp.eskimo.com/u/w/waldzell/Yiklamu/Lexicon/

       if you like. The format of these files (after being gunzipped) is
       completely undocumented; if anybody needs help figuring out what's
       what,
       let me know -- it should be obvious, though, if you compare an entry
       from
       my files with an online WordNet interface on the Web somewhere. The
       first
       field, incidentally, contains a lexical item in Yiklamu, which you can
       safely ignore.

       Also, it's relatively straightforward to generate extracts of
       additional
       information from the WordNet database to link to these synset
       files. Using
       the semantic network information connecting the synsets, for instance,
       it
       would be pretty easy to construct a (monolingual or bilingual)
       thesaurus
       in addition to the usual bilingual dictionary pair.


       -- Mark

       (Mark P. Line   ----   Bellevue, Washington   ----
       mline at ix.netcom.com)

       ----
       Endangered-Languages-L Forum:
       endangered-languages-l at carmen.murdoch.edu.au
       Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
       Subscribe/unsubscribe and other commands:
       majordomo at carmen.murdoch.edu.au
       ----



More information about the Endangered-languages-l mailing list