[Lexicog] word list

Hayim Sheynin hsheynin19444 at YAHOO.COM
Mon Dec 11 02:56:39 UTC 2006


Dear Bill and Mike,

Considering all the troubles connected to multiple forms of the same word, both
verbs and nouns, is it not better to use the list words of a solid printed English dictionary like OED or Webster or even Random House D. It is also possible to use large translation dictionaries, depending on language, in which a compiler is fluent. The comfort of their lists consist both in indication of character of verb and
gender of noun.
    For classical languages, Greek and Latin it is easier, because the words are usually given there in a representative form with indication to conjugation or declension.
    One should recognize that it is not always practical to deal with the corpus,
when you compile a dictionary, especially Urdu-English dictionary.
    For this case I would use big English-Persian  and English-Arabic dictionaries.   

Hayim Sheynin

Mike Maxwell <maxwell at ldc.upenn.edu> wrote:                                  billposer at alum.mit.edu wrote:
 > If you're just looking for a large wordlist, one such list
 > is the list that is distributed with many GNU/Linux systems,
 > usually in /usr/share/dict/words. The older list is only about 45,000
 > words, but some systems have a longer list of over 200,000 words.
 
 That will be wordforms (including inflected  forms), won't it, Bill?  So 
 nearly every noun will have two forms (or three, if you count 
 possessives--but maybe the GNU/Linux apps are smart enough that they 
 don't need those).
  
 Speaking of the size of word lists, I saw what had to be one of the 
 dumbest reasons to make English, rather than French, the main language 
 (in some sense, I don't recall what) of the EU: English has more words. 
   I forget the counts--something like 450k for English vs. 200k for 
 French.  Unfortunately I don't recall the citation.  I'm guessing that 
 they were talking about lexemes, else the many inflected forms of French 
 verbs would, I would have thought, have increased the French number.  No 
 idea whether they included English particle verbs, or how they drew the 
 line between which compound nouns to include and which not to include.
 
 > Of course another way of obtaining a wordlist is simply to acquire
  > a big chunk of English text (say some combination of internet
  > posts and books from Project Guttenberg) and extract from it a
  > list of the unique words.
 
 Don't forget to include Canterbury Tales if you're doing books, or 
 http://houseoffame.blogspot.com/ if you're doing internet posts :-).
 -- 
  Mike Maxwell
  maxwell at ldc.upenn.edu
 
     
                       

 
---------------------------------
Want to start your own business? Learn how on Yahoo! Small Business.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20061210/bbea2f7e/attachment.htm>


More information about the Lexicography mailing list