% of English words from Latin and Greek

Tom Zurinskas truespel at HOTMAIL.COM
Thu Nov 5 02:35:21 UTC 2009

 <200911042210.nA4H1wHZ012977 at malibu.cc.uga.edu>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

But what are the origins of the words examined by word frequency in (counti=
ng word repetition).  To analyze this you need a frequency count such as at

Here the top 4=2C000 most frequent words were counted having 100 million in=
stances (e.g. the top word "the" has 6 million instances).  Put these in a =
spreadsheet and put an origin mark next to the top 100.
The top 100 words are said to represent 50% of the words we say or see in p=
rint.  These are the simple words of English.  I'd say they mostly are not =
latinate=2C but I'm no expert here. =20
They say the top 600 words make up 75% of what we say or see in print if  y=
ou want to go that far.  The 4=2C000 words make up perhaps 90%.

Tom Zurinskas=2C USA - CT20=2C TN3=2C NJ33=2C FL7+
see truespel.com phonetic spelling

> Date: Wed=2C 4 Nov 2009 22:08:50 +0000
> From: robin.hamilton2 at BTINTERNET.COM
> Subject: Re: % of English words from Latin and Greek
> ---------------------- Information from the mail header -----------------=
> Sender: American Dialect Society=20
> Poster: Robin Hamilton=20
> Subject: Re: % of English words from Latin and Greek
> -------------------------------------------------------------------------=
>> This is digging up very old and unreliable memories=2C but I believe I =
>> read at some point that about 50 percent of English vocabulary was =3D20
>> Latinate in origin (including all those French words the Normans =3D20
>> introduced) and 45 percent Anglo-Saxon in origin=2C while only about 5 =
>> percent came from third languages.
> [SNIP]
>> Stephen Hughes
> I wonder=2C whatever the percentages given were and whatever text they we=
> derived from=2C if they could have=2C in the days before computers=2C bee=
> anything other than impressionistic? Now=2C it should be possible to exam=
> the contents of the online OED and determine=2C based on the etymologies
> given=2C where any particular word came from.
> But even that apparently objective approach would be open to objection=2C=
> it would include obsolete words. Therefore=2C as a first approximation to=
> serious anwer to the question=2C it would have to be time-stamped -- "The
> English language in 1000 / 1500 /2000 included X% of words derived
> {directly} from Latin." (The {caveat} signals the problem of Latin terms
> derived from a French [or Italian=2C or whatever] intermediary. And geogr=
> as well as time is perhaps also a factor=2C when we consider that Scots
> developed as a separate and distinct branch of English=2C more than a dia=
> after at least the early fifteenth century.)
> But there are still further problems in even this more apprently objectiv=
> counting. What _is_ a word? A semantic item=2C in which case perhaps all
> varieties of the personal pronoun would be subsumed under "I" (we he she
> it)=2C one "word"? But this would disadvantage English (which provides al=
> the words of this nature) at the expense of Latin=2C while the opposite
> approach=2C counting each and every inflectional form as a separate word=
> would in turn disdavantage Latin in this race.
> So even if we restrict our answer to "English as found in the country of
> England in the year 1950=2C" there are already problems.
> I've deliberately avoided the phrasing=2C "English as _spoken_ in the cou=
> of England=2C" since it begs the question of just what is to be included.
> *All scientific and medical terminology? Botany would skew the sample
> wildly.
> Even on the level of the individual speaker=2C there would be problems=2C=
 as the
> number of words recognised is larger than the number of words used. So
> which would we count?
> I don't think it would be impossible to get some sort of significant answ=
> to a question regarding the proportion of foreign versus native origins o=
> English terms at any particular time and place=2C but it would have to be
> couched carefully=2C and hedged with qualifications=2C to be anything oth=
er than
> a nonsense question with meaningless answer.
> Robin Hamilton
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org                                          =
Find the right PC with Windows 7 and Windows Live.=20

The American Dialect Society - http://www.americandialect.org

More information about the Ads-l mailing list