How many words in English and how many does one know?
Tom Zurinskas
truespel at HOTMAIL.COM
Wed Oct 31 15:51:33 UTC 2007
Googling on "How many words are there in English" I find counts below. Far short of a billion.
http://www.worldwidewords.org/articles/howmany.htm
HOW MANY WORDS?
How many in the language and how many does any one person know?
One of the more common questions that arrive for the Q&A section asks how many words there are in the English language. Almost as common are requests for the average size of a person’s vocabulary. These sound like easy questions; I have to tell you that they’re indeed easy to ask. But they’re almost impossible to answer satisfactorily, because it all depends what you mean by word and by vocabulary (or even English).
What we mean by word sounds obvious, but it’s not. Take a verb like climb. The rules of English allow you to generate the forms climbs, climbed, climbable, and climbing, the nouns climb and climber (and their plurals climbs and climbers), compounds such as climb-down and climbing frame, and phrasal verbs like climb on, climb over, and climb down. Now, here’s the question you’ve got to answer: are all these distinct words, or do you lump them all together under climb?
That this is not a trivial question can be proved by looking at half a dozen current dictionaries. You won’t find two that agree on what to list. Almost every word in the language has this fuzzy penumbra of inflected forms, separate senses and compounds, some to a much greater extent than climb. To take a famous case, the entry for set in the Oxford English Dictionary runs to 60,000 words. The noun alone has 47 separate senses listed. Are all these distinct words?
And in a wider sense, what do you include in your list of words? Do you count all the regional variations of English? Or slang? Dialect? Family or private language? Proper names and the names of places? And what about abbreviations? The biggest dictionary of them has more than 400,000 entries — do you count them all as words? And what about informal and formal names for living things? The wood louse is known in Britain by many local names — tiggy-hog, cheeselog, pill bug, chiggy pig, and rolypoly among others. Are these all to be counted as separate words? And, to take a more specialist example, is Saccharomyces cerevisiae, the formal name for bread yeast, to be counted as a word (or perhaps two)? If you say yes, you’ve got to add another couple of million such names to the English-language word count. And what about medical terms, such as syncytiotrophoblastic or holoprosencephaly, that few of us ever encounter?
The other difficult term is vocabulary. What counts as a word that somebody knows? Is it one that a person uses regularly and accurately? Or perhaps one that will be correctly recognised — say in written text — but not used? Or perhaps one that will be understood in context but which the person may not easily be able to define? This distinction between what linguists call active and passive vocabularies is hard to measure, and it skews estimates.
The problem doesn’t stop there. English speakers not only know words, they know word-forming elements, such as the ending -phobia for some irrational fear. A journalist rushing to meet a deadline might take a word he knows, like Serb, and tack on the ending to make Serbophobia. He’s just added a word to the language (probably only temporarily), but can he really be said to have that word in his vocabulary? If nobody ever uses it again, can we legitimately count it? By reversing the coining process, a reader of the newspaper can easily work out the word’s origin and meaning. Has the reader also added a word to his vocabulary?
Can you now see why estimates of the total number of words in the English language and in a person’s vocabulary are so difficult to make, and why they vary so much one from another? David Crystal, in the Cambridge Encyclopedia of the English Language, suggests that there must be at least a million words in the language. Tom McArthur, in the Oxford Companion to the English Language, comes up with a similar figure. David Crystal further says that if you allow all scientific terms the total could easily reach two million (this doesn’t count the formal names for organisms I spoke about earlier, just technical vocabulary).
Assessing the size of the vocabulary of an individual is at least as problematical. Take Shakespeare: you’d think it would be easy to assess his vocabulary. We have the plays and sonnets and we just have to count the words in them (according to the American Heritage Dictionary, there are 884,647 of them, made up of 29,066 distinct forms, including proper names). But estimates of Shakespeare’s vocabulary vary from about 18,000 to 25,000 in various books, because writers have different views about what constitutes a distinct word.
It’s common to see figures for vocabulary quoted such as 10,000-12,000 words for a 16-year-old, and 20,000-25,000 for a college graduate. These seem not to have much research to back them up. Usually they don’t make clear whether active or passive vocabulary is being quoted, and they don’t account for differences in lifestyle, profession and hobby interests between individuals.
David Crystal described a simple research project — using random pages from a dictionary — that suggests these figures are severe underestimates. He concludes that a better average for a college graduate might be 60,000 active words and 75,000 passive ones. But this method of assessing vocabulary counts dictionary headwords only; it would be possible to multiply it several-fold to include different senses, inflected forms, and compounds. Another assessment — of a million-word collection of American texts — identified about 38,000 headwords. Bearing in mind this was all general writing, this doesn’t sound so different from David Crystal’s estimates for graduate vocabularies.
http://www.askoxford.com/asktheexperts/faq/aboutenglish/numberwords
There is no single sensible answer to this question. It is impossible to count the number of words in a language, because it is so hard to decide what counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (dogs plural noun, dogs present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since we might also find hot-dog or even hotdog?
It is also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Youth slang? Computing jargon?
The Second Edition of the Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of interjections, conjunctions, prepositions, suffixes, etc. These figures take no account of entries with senses for different parts of speech (such as noun and adjective).
This suggests that there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.
...It seems quite probable that English has more words than most comparable world languages. The reason for this is historical. English was originally a Germanic language, related to Dutch and German, and it shares much of its grammar and basic vocabulary with those languages. However, after the Norman Conquest in 1066 it was hugely influenced by Norman French, which became the language of the ruling class for a considerable period, and by Latin, which was the language of scholarship and of the Church. Very large numbers of French and Latin words entered the language. Consequently, English has a much larger vocabulary than either the Germanic languages or the members of the Romance language family to which French belongs.
English is also very ready to accommodate foreign words, and as it has become an international language, it has absorbed vocabulary from a large number of other sources. This does, of course, assume that you ignore 'agglutinative' languages such as Finnish, in which words can be stuck together in long strings of indefinite length, and which therefore have an almost infinite number of 'words'.
http://www.worldwidewords.org/articles/howmany.htm
HOW MANY WORDS?
How many in the language and how manydoes any one person know?
One of the more common questions that arrive for the Q&A section asks how many words there are in the English language. Almost as common are requests for the average size of a person’s vocabulary. These sound like easy questions; I have to tell you that they’re indeed easy to ask. But they’re almost impossible to answer satisfactorily, because it all depends what you mean by word and by vocabulary (or even English).
What we mean by word sounds obvious, but it’s not. Take a verb like climb. The rules of English allow you to generate the forms climbs, climbed, climbable, and climbing, the nouns climb and climber (and their plurals climbs and climbers), compounds such as climb-down and climbing frame, and phrasal verbs like climb on, climb over, and climb down. Now, here’s the question you’ve got to answer: are all these distinct words, or do you lump them all together under climb?
That this is not a trivial question can be proved by looking at half a dozen current dictionaries. You won’t find two that agree on what to list. Almost every word in the language has this fuzzy penumbra of inflected forms, separate senses and compounds, some to a much greater extent than climb. To take a famous case, the entry for set in the Oxford English Dictionary runs to 60,000 words. The noun alone has 47 separate senses listed. Are all these distinct words?
And in a wider sense, what do you include in your list of words? Do you count all the regional variations of English? Or slang? Dialect? Family or private language? Proper names and the names of places? And what about abbreviations? The biggest dictionary of them has more than 400,000 entries — do you count them all as words? And what about informal and formal names for living things? The wood louse is known in Britain by many local names — tiggy-hog, cheeselog, pill bug, chiggy pig, and rolypoly among others. Are these all to be counted as separate words? And, to take a more specialist example, is Saccharomyces cerevisiae, the formal name for bread yeast, to be counted as a word (or perhaps two)? If you say yes, you’ve got to add another couple of million such names to the English-language word count. And what about medical terms, such as syncytiotrophoblastic or holoprosencephaly, that few of us ever encounter?
The other difficult term is vocabulary. What counts as a word that somebody knows? Is it one that a person uses regularly and accurately? Or perhaps one that will be correctly recognised — say in written text — but not used? Or perhaps one that will be understood in context but which the person may not easily be able to define? This distinction between what linguists call active and passive vocabularies is hard to measure, and it skews estimates.
The problem doesn’t stop there. English speakers not only know words, they know word-forming elements, such as the ending -phobia for some irrational fear. A journalist rushing to meet a deadline might take a word he knows, like Serb, and tack on the ending to make Serbophobia. He’s just added a word to the language (probably only temporarily), but can he really be said to have that word in his vocabulary? If nobody ever uses it again, can we legitimately count it? By reversing the coining process, a reader of the newspaper can easily work out the word’s origin and meaning. Has the reader also added a word to his vocabulary?
Can you now see why estimates of the total number of words in the English language and in a person’s vocabulary are so difficult to make, and why they vary so much one from another? David Crystal, in the Cambridge Encyclopedia of the English Language, suggests that there must be at least a million words in the language. Tom McArthur, in the Oxford Companion to the English Language, comes up with a similar figure. David Crystal further says that if you allow all scientific terms the total could easily reach two million (this doesn’t count the formal names for organisms I spoke about earlier, just technical vocabulary).
Assessing the size of the vocabulary of an individual is at least as problematical. Take Shakespeare: you’d think it would be easy to assess his vocabulary. We have the plays and sonnets and we just have to count the words in them (according to the American Heritage Dictionary, there are 884,647 of them, made up of 29,066 distinct forms, including proper names). But estimates of Shakespeare’s vocabulary vary from about 18,000 to 25,000 in various books, because writers have different views about what constitutes a distinct word.
It’s common to see figures for vocabulary quoted such as 10,000-12,000 words for a 16-year-old, and 20,000-25,000 for a college graduate. These seem not to have much research to back them up. Usually they don’t make clear whether active or passive vocabulary is being quoted, and they don’t account for differences in lifestyle, profession and hobby interests between individuals.
David Crystal described a simple research project — using random pages from a dictionary — that suggests these figures are severe underestimates. He concludes that a better average for a college graduate might be 60,000 active words and 75,000 passive ones. But this method of assessing vocabulary counts dictionary headwords only; it would be possible to multiply it several-fold to include different senses, inflected forms, and compounds. Another assessment — of a million-word collection of American texts — identified about 38,000 headwords. Bearing in mind this was all general writing, this doesn’t sound so different from David Crystal’s estimates for graduate vocabularies.
Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+See truespel.com - and the 4 truespel books plus "Occasional Poems" at authorhouse.com.
_________________________________________________________________
Climb to the top of the charts! Play Star Shuffle: the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list