% of English words from Latin and Greek

Tom Zurinskas truespel at HOTMAIL.COM
Thu Nov 5 02:35:21 UTC 2009

But what are the origins of the words examined by word frequency in (counti=
ng word repetition).  To analyze this you need a frequency count such as at

Here the top 4=2C000 most frequent words were counted having 100 million in=
stances (e.g. the top word "the" has 6 million instances).  Put these in a =
spreadsheet and put an origin mark next to the top 100.
The top 100 words are said to represent 50% of the words we say or see in p=
rint.  These are the simple words of English.  I'd say they mostly are not =
latinate, but I'm no expert here.
They say the top 600 words make up 75% of what we say or see in print if  y=
ou want to go that far.  The 4=2C000 words make up perhaps 90%.

>> This is digging up very old and unreliable memories=2C but I believe I =
Latinate in origin (including all those French words the Normans
introduced) and 45 percent Anglo-Saxon in origin, while only about 5
percent came from third languages.
>> percent came from third languages.
Stephen Hughes
> I wonder=2C whatever the percentages given were and whatever text they we=
derived from, if they could have, in the days before computers, been
anything other than impressionistic? Now, it should be possible to examine
> the contents of the online OED and determine=2C based on the etymologies
> given=2C where any particular word came from.
> But even that apparently objective approach would be open to objection=2C=
it would include obsolete words. Therefore, as a first approximation to a
> serious anwer to the question=2C it would have to be time-stamped -- "The
> English language in 1000 / 1500 /2000 included X% of words derived
> {directly} from Latin." (The {caveat} signals the problem of Latin terms
derived from a French [or Italian, or whatever] intermediary. And geography
> as well as time is perhaps also a factor=2C when we consider that Scots
developed as a separate and distinct branch of English, more than a dialect
> after at least the early fifteenth century.)
> But there are still further problems in even this more apprently objectiv=
> counting. What _is_ a word? A semantic item=2C in which case perhaps all
> varieties of the personal pronoun would be subsumed under "I" (we he she
the words of this nature) at the expense of Latin, while the opposite
> the words of this nature) at the expense of Latin=2C while the opposite
would in turn disdavantage Latin in this race.
> would in turn disdavantage Latin in this race.
> So even if we restrict our answer to "English as found in the country of
> England in the year 1950=2C" there are already problems.
> I've deliberately avoided the phrasing=2C "English as _spoken_ in the cou=
> of England=2C" since it begs the question of just what is to be included.
> *All scientific and medical terminology? Botany would skew the sample
> wildly.
> Even on the level of the individual speaker=2C there would be problems=2C=
 as the
> number of words recognised is larger than the number of words used. So
> which would we count?
> I don't think it would be impossible to get some sort of significant answ=
> to a question regarding the proportion of foreign versus native origins o=
> English terms at any particular time and place=2C but it would have to be
couched carefully, and hedged with qualifications, to be anything other than
er than
> a nonsense question with meaningless answer.
> Robin Hamilton
