[Lexicog] word frequency

Jan F. Ullrich jfu at CENTRUM.CZ
Tue Sep 13 17:39:29 UTC 2005


Dear lexicographers

I have a question concerning work on a dictionary and a text-corpus using
Shoebox/Toolbox.

Our current Lakota-English/English-Lakota dictionary database is in Toolbox
format and contains over 40,000 entries. This database is in part generated
from a text corpus of several million words.
I have been using occurrence frequency in the text corpus as one of the
criteria for including a lemma in a smaller student dictionary (about
10-15,000 entries).  The corpus based wordlist with frequency numbers has
been very helpful in this, although I wish some of the steps could be
automated.
I noticed that some of the recent commercially published student
dictionaries (of English, Spanish etc.) mark the most frequent words (e.g.
the 4,000 most frequent words out of 30,000 entries). Hence my questions:

1)
Can the occurrence-number be transferred from each of the words in the
word-list (created from text corpus) into the appropriate entry of the
dictionary database? Assuming there is a filed marker for that.

2)
Is it possible that the MDF exports some sort of a mark that would flag a
certain number of the most frequent words? 
I assume a CC table could add an info to all entries whose frequency number
is abouve the value we set. The question is what type of a mark or a flag
can be added by the MDF into the paragraph. The student dictionaries I saw
use certain symbols (e.g. asterisk or a ball) right in the white space in
front of the lemma.


Jan F. Ullrich
Lakota Language Consortium
www.lakhota.org





------------------------ Yahoo! Groups Sponsor --------------------~--> 
Get fast access to your favorite Yahoo! Groups. Make Yahoo! your home page
http://us.click.yahoo.com/dpRU5A/wUILAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~-> 

 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list