[Lexicog] word list
Hayim Sheynin
hsheynin19444 at YAHOO.COM
Mon Dec 11 02:56:39 UTC 2006
Dear Bill and Mike,
Considering all the troubles connected to multiple forms of the same word, both
verbs and nouns, is it not better to use the list words of a solid printed English dictionary like OED or Webster or even Random House D. It is also possible to use large translation dictionaries, depending on language, in which a compiler is fluent. The comfort of their lists consist both in indication of character of verb and
gender of noun.
For classical languages, Greek and Latin it is easier, because the words are usually given there in a representative form with indication to conjugation or declension.
One should recognize that it is not always practical to deal with the corpus,
when you compile a dictionary, especially Urdu-English dictionary.
For this case I would use big English-Persian and English-Arabic dictionaries.
Hayim Sheynin
Mike Maxwell <maxwell at ldc.upenn.edu> wrote: billposer at alum.mit.edu wrote:
> If you're just looking for a large wordlist, one such list
> is the list that is distributed with many GNU/Linux systems,
> usually in /usr/share/dict/words. The older list is only about 45,000
> words, but some systems have a longer list of over 200,000 words.
That will be wordforms (including inflected forms), won't it, Bill? So
nearly every noun will have two forms (or three, if you count
possessives--but maybe the GNU/Linux apps are smart enough that they
don't need those).
Speaking of the size of word lists, I saw what had to be one of the
dumbest reasons to make English, rather than French, the main language
(in some sense, I don't recall what) of the EU: English has more words.
I forget the counts--something like 450k for English vs. 200k for
French. Unfortunately I don't recall the citation. I'm guessing that
they were talking about lexemes, else the many inflected forms of French
verbs would, I would have thought, have increased the French number. No
idea whether they included English particle verbs, or how they drew the
line between which compound nouns to include and which not to include.
> Of course another way of obtaining a wordlist is simply to acquire
> a big chunk of English text (say some combination of internet
> posts and books from Project Guttenberg) and extract from it a
> list of the unique words.
Don't forget to include Canterbury Tales if you're doing books, or
http://houseoffame.blogspot.com/ if you're doing internet posts :-).
--
Mike Maxwell
maxwell at ldc.upenn.edu
---------------------------------
Want to start your own business? Learn how on Yahoo! Small Business.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20061210/bbea2f7e/attachment.htm>
More information about the Lexicography
mailing list