% of English words from Latin and Greek

Thu Nov 5 02:35:21 UTC 2009

 <200911042210.nA4H1wHZ012977 at malibu.cc.uga.edu>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

But what are the origins of the words examined by word frequency in (counti=
ng word repetition).  To analyze this you need a frequency count such as at
=20
ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5

Here the top 4=2C000 most frequent words were counted having 100 million in=
stances (e.g. the top word "the" has 6 million instances).  Put these in a =
spreadsheet and put an origin mark next to the top 100.
=20
The top 100 words are said to represent 50% of the words we say or see in p=
rint.  These are the simple words of English.  I'd say they mostly are not =
latinate=2C but I'm no expert here. =20
=20
They say the top 600 words make up 75% of what we say or see in print if  y=
ou want to go that far.  The 4=2C000 words make up perhaps 90%.

Tom Zurinskas=2C USA - CT20=2C TN3=2C NJ33=2C FL7+
see truespel.com phonetic spelling

----------------------------------------
> Date: Wed=2C 4 Nov 2009 22:08:50 +0000
> From: robin.hamilton2 at BTINTERNET.COM
> Subject: Re: % of English words from Latin and Greek
> To: ADS-L at LISTSERV.UGA.EDU
>
> ---------------------- Information from the mail header -----------------=
------
> Sender: American Dialect Society=20
> Poster: Robin Hamilton=20
> Subject: Re: % of English words from Latin and Greek
> -------------------------------------------------------------------------=
------
>
>> This is digging up very old and unreliable memories=2C but I believe I =
=3D20
>> read at some point that about 50 percent of English vocabulary was =3D20
>> Latinate in origin (including all those French words the Normans =3D20
>> introduced) and 45 percent Anglo-Saxon in origin=2C while only about 5 =
=3D20
>> percent came from third languages.
> [SNIP]
>>
>> Stephen Hughes
>
> I wonder=2C whatever the percentages given were and whatever text they we=
re
> derived from=2C if they could have=2C in the days before computers=2C bee=
n
> anything other than impressionistic? Now=2C it should be possible to exam=
ine
> the contents of the online OED and determine=2C based on the etymologies
> given=2C where any particular word came from.
>
> But even that apparently objective approach would be open to objection=2C=
 as
> it would include obsolete words. Therefore=2C as a first approximation to=
 a
> serious anwer to the question=2C it would have to be time-stamped -- "The
> English language in 1000 / 1500 /2000 included X% of words derived
> {directly} from Latin." (The {caveat} signals the problem of Latin terms
> derived from a French [or Italian=2C or whatever] intermediary. And geogr=
aphy
> as well as time is perhaps also a factor=2C when we consider that Scots
> developed as a separate and distinct branch of English=2C more than a dia=
lect=2C
> after at least the early fifteenth century.)
>
> But there are still further problems in even this more apprently objectiv=
e
> counting. What _is_ a word? A semantic item=2C in which case perhaps all
> varieties of the personal pronoun would be subsumed under "I" (we he she
> it)=2C one "word"? But this would disadvantage English (which provides al=
l
> the words of this nature) at the expense of Latin=2C while the opposite
> approach=2C counting each and every inflectional form as a separate word=
=2C
> would in turn disdavantage Latin in this race.
>
> So even if we restrict our answer to "English as found in the country of
> England in the year 1950=2C" there are already problems.
>
> I've deliberately avoided the phrasing=2C "English as _spoken_ in the cou=
ntry
> of England=2C" since it begs the question of just what is to be included.
> *All scientific and medical terminology? Botany would skew the sample
> wildly.
>
> Even on the level of the individual speaker=2C there would be problems=2C=
 as the
> number of words recognised is larger than the number of words used. So
> which would we count?
>
> I don't think it would be impossible to get some sort of significant answ=
er
> to a question regarding the proportion of foreign versus native origins o=
f
> English terms at any particular time and place=2C but it would have to be
> couched carefully=2C and hedged with qualifications=2C to be anything oth=
er than
> a nonsense question with meaningless answer.
>
> Robin Hamilton
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org                                          =
=20
_________________________________________________________________
Find the right PC with Windows 7 and Windows Live.=20
http://www.microsoft.com/Windows/pc-scout/laptop-set-criteria.aspx?cbid=3Dw=
l&filt=3D200=2C2400=2C10=2C19=2C1=2C3=2C1=2C7=2C50=2C650=2C2=2C12=2C0=2C100=
0&cat=3D1=2C2=2C3=2C4=2C5=2C6&brands=3D5=2C6=2C7=2C8=2C9=2C10=2C11=2C12=2C1=
3=2C14=2C15=2C16&addf=3D4=2C5=2C9&ocid=3DPID24727::T:WLMTAGL:ON:WL:en-US:WW=
L_WIN_evergreen2:112009=

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org