cognate vocabulary as a measure of relatedness

Adrian Clynes aclynes at
Wed Jun 12 07:41:00 UTC 2002

My thanks to those who responded to my query about percentage cognate
vocabulary as a measure of relatedness between languages, and about %'ages
for English with other languages:

To David Mead, Andy Pawley & Ross Clark, whose comments have already been
posted on this list,

Also to David Nash (& Nick Thieberger for passing on the query to David),
Paul Kroeger, Bob Blust, and Gillian Sankoff, whose comments were sent
direct to me, and are attached below.

The original query is right at the end of this message

My thanks again,
Adrian Clynes

Here are the messages not yet posted on AN-LANG:

1) from Paul Kroeger:

"[SIL] often use 80% cognate as a rough boundary below which speakers of
different varieties are likely to have problems communicating. Gary Simons'
PhD dissertation at Cornell discussed the correlations between lexical
similarity and intelligibility. I'm sure there has been more recent work on
these issues, but I haven't kept up with it."

2) from David Nash

"The answers to most if not all your questions are in Kruskal, Dyen &
Black 1973, ref at
Their data is at  "  [AC: the
latter is a useful source for Indoeuropean languages]

[AC: see also Alpher & Nash 1999 "Lexical replacement and cognate
equilibrium in Australia" Australian Journal of Linguistics 19,1 5-56]

3) from Bob Blust

"English and German score 60-65%, English and French about 20%. More exact
figures are available in the
lexicostatistical literature. Isidore Dyen did a book a few years ago on
Indo-European subgrouping using
lexicostatistics in which you can probably find the figures he uses.

Dyen in his 'Lexicostatistical classification of the Austronesian
languages', and in other publications
from that era, uses 70% basic vocabulary cognation as the 'language limit'
(viz. the point at which related
language communities cease being dialects of one language and become
distinct languages). Obviously,
it is absurd to adhere to a number like this in any rigid way (two
phonologically conservative languages
sharing 69% of their basic vocabulary may have greater mutual
intelligibility than two which share 80%, but
one or both of which have undergone extensive sound changes). Stephen Wurm
in much of his New Guinea work
used 81% rather than 70% as the cut-off point.

Darrell Tryon followed Wurm in this respect when he did his 'New Hebrides
languages' book (PL) in 1976 or
so.  A recent conference organized by the archaeologist Colin Renfrew at
the MacDonald Institute for
Archaeological Research in Cambridge, England focused on archaeological and
linguistic approaches to issues
of time-depth, and resulted in a published volume (I'm on sabbatical in
Taiwan and don't have direct access
to it, so can't give you the exact title now)."

4) From Gilllian Sankoff

"In 1969 David Sankoff used C.D.Buck's "Dictionary of selected synonyms in
. . . . Indo-European languages" to make such a calculation and create a
family tree based on this grouping method which was then compared to the
groupings derived from the comparative method. I THINK the results are to
be found in the following article which I don't have to hand:

"On the rate of replacement of word-meaning relationships " Language 46(3),
September, 1970: 564-569.
It has the basic figures for French-English, English-German, etc. You might
be able to get some more info directly from David Sankoff:
sankoff at ERE.UMontreal.CA.

If I remember correctly, below 75% - 80% in common pretty much gives you a
situation of mutual unintelligibility; 80-85% gets you into the range of
Scandinavian languages I think. [...]

p.s. there is more recent work by Donald Ringe (dringe at
and Tandy Warnow, developing more sophisticated algorithms if you are

Original query:
I would be grateful for a quick answer or answers to one or more of the
1) What percentage of basic vocabulary is cognate in English and German,
using (say) a 200-item Swadesh list?
2) Ditto, for English and French?
3) Is there a commonly accepted percentage of cognate basic vocabulary, at
or below which one might expect two varieties to be considered distinct
languages, rather than dialects of the same language?
4) Can you recommend a reference discussing this kind of approach?

