[Lexicog] Discovering the lexicon via semantic domains

Ron Moe ron_moe at sil.org
Wed Dec 31 00:10:51 UTC 2003


> For the past three years I have been developing a lexicography tool that I
> call the Dictionary Development Program (DDP). At this point in its
> development it is useful as a word collection tool. I use a list of 1750
> semantic domains that I have compiled from numerous sources. I have
> attempted to make it as universal and exhaustive as possible, but it needs
> more input from non-Indo-European languages. In November I released
version
> 2 of the DDP. I am currently on vacation and leaving in an hour for a
week.
> When I get back I would be happy to send the materials to anyone
interested.
> Send me an email and I'll send you the materials via email. They are about
> 1MB.
>
> Essentially the method uses semantic domains to prompt speakers of a
> language to think of the words in their language that belong to each
domain.
> I've collected sample words from English, organized them into lexical
sets,
> and written an elicitation question for each lexical set. An example from
> the domain 'Wind' would be:
>
> What words describe a wind that only lasts a short time?  breath of air,
> puff of wind, gust, blast, flurry
>
> In a ten day workshop about 30 speakers of Lunyole (Bantu, Uganda)
collected
> a total of 17,000 lexical items that boiled down to 12,000 unique words
and
> phrases. The extra 5,000 were duplicates that showed up in more than one
> domain and often represent multiple senses. The method utlilizes the
mental
> network in each person's head. With a little practice it is possible to
> think of words almost as fast as you can write them down. This is far more
> efficient than using the text corpus method or any other method I have
heard
> of. I also have the workshop participants gloss the words in the national
> language. Since the words are collected by domain, you end up with a
> classified glossed word list and a 1,750 entry thesaurus. Once you collect
> the words, you can use automated routines to expand the word list into a
> basic dictionary. I'm beginning to work on materials to help speakers of a
> language define the words in each domain. My goal is to produce a tool
that
> is as easy and efficient as possible, so that speakers of a language with
> little or no training in lexicography can produce a reasonably good
> dictionary of massive proportions. Lexicographers estimate that there are
> 23,000 unique words in most languages, with perhaps 50,000 lexical items
> including multiple senses and phrases. So even with my method we are only
> collecting about half the words. I hope to refine the method and increase
> this percentage. A dictionary of 3-4,000 entries is rather pitiful.
(Pardon
> me for saying so.) The text corpus method is advantageous in many ways but
> useless in a language where there are no texts. Even when you have some
> texts, setting up the parser and manually adding entries to your database
> results in a small dictionary that is uneven in its depth and breadth of
> treatment. Even automating some sort of concordance program only results
in
> a simple list of words with no gloss and no semantic classification. I
> prefer to collect the words all at once, and use automated routines to
> develop the word list. Then as time and opportunity allows, use the text
> corpus method to collect natural examples of usage. Many lexicographers
> recommend investigating semantics within the context of semantic domains.
>
> Ron Moe
>
>
> -----Original Message-----
> From: Wayne Leman [mailto:wayne_leman at sil.org]
> Sent: Monday, December 29, 2003 2:15 PM
> To: lexicographylist at yahoogroups.com
> Subject: [Lexicog] Discovering the lexicon via semantic domains
>
>
> Lexicographers have used a variety of techniques to discover the
> words of the lexicon of a language, e.g.
>
> 1. combing through vernacular texts (I have heard of one linguist
> making a dictionary who would not enter a form in the lexicon until
> it was found in a text)
>
> 2. trying to match entries in a national language dictionary
>
> 3. semantic association through work with semantic domains of the
> language
>
> A Cheyenne lady and I used #3 when we made a Cheyenne Topical
> Dictionary a number of years ago. Cheyenne language teachers have
> found the topical dictionary helpful. But there are many more words
> in the language than those we came up with using the particular
> semantic domains we investigated.
>
> What are some techniques that you all have found helpful to discover
> lexical entries for a language? In particular, how have you increased
> the number of lexical entries beyond a typical basic number of forms
> based on the semantics of some basic anthropological linguistic check
> lists or similar lists?
>
> Although dictionaries for many previously unstudied languages
> typically contain 5,000 or so entries, we all know, I think, that the
> lexicons of these languages actually contain many more thousands of
> entries which are not found in such "basic" dictionaries. Have you
> found techniques to successfully get the number of valid lexical
> entries up to the neighborhood of 15,000-25,000, esp. within a
> relatively short period of research time?
>
> Wayne Leman
>
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
> http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
> lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>
>
>
> SMS 8
>
>
>
>       Yahoo! Groups Sponsor
>             ADVERTISEMENT
>
>
>
>
>
> --------------------------------------------------------------------------
------
> Yahoo! Groups Links
>
>   a.. To visit your group on the web, go to:
>   http://groups.yahoo.com/group/lexicographylist/
>
>   b.. To unsubscribe from this group, send an email to:
>   lexicographylist-unsubscribe at yahoogroups.com
>
>   c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
> SMS 8



More information about the Lexicography mailing list