A million English words, or only 600,000? Either way, it's a language packed with more words than you'll ever need

Benjamin Zimmer bgzimmer at BABEL.LING.UPENN.EDU
Wed Jul 9 15:41:28 UTC 2008


On Wed, Jul 9, 2008 at 11:33 AM, Laurence Horn <laurence.horn at yale.edu> wrote:
>
> At 3:04 PM +0000 7/9/08, Tom Zurinskas wrote:
> >One person said there are 2 billion English words.  Another said 1
> >million.  That's a difference factor of 2,000.   That's like looking
> >at a tree and one person estimating it's 1 inch tall, while the
> >other estimates it's 2000 inches tall (170 feet).
>
> No it's not.  Nobody (except you) was claiming that there are 2
> billion *different words* in the English language.  There was
> discussion of a 2 billion word database, but unless every word in the
> the database is distinct from every other word (i.e. no repetitions,
> so every word has a frequency count of 1), the number of *types*
> (which is what's under discussion in this thread) will be far smaller
> than the number of *tokens*.  In "The woman discussed the letter with
> the man" there are 8 word tokens but 6 word types.  Pick up a
> newspaper and see how many paragraphs (if any) you can find that have
> the same number of word types and word tokens.  This has been
> explained several times on the list, so I'm not exactly sure why I'm
> trying to do it again...

And explained to Tom Z. directly at least twice...

http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0710C&L=ADS-L&P=R6213
http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0710E&L=ADS-L&P=R3913


--Ben Zimmer

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list