Retained Information vs. Random Noise

Larry Trask larryt at cogs.susx.ac.uk
Thu Sep 9 14:27:55 UTC 1999


On Wed, 8 Sep 1999 ECOLING at aol.com wrote:

> The most basic Multilateral Comparison, like the work
> under Catherine the Great, and like the small table in
> Greenberg's theoretical intro section to Language in the Americas,
> has all cells filled, a word is included for each language in the
> sample for the semantic category defined.  As a result,
> the only judgements being made are about closer or less close.
> The judgements are NOT about whether a given pairing of words
> is a cognate or not in any absolute sense.  It is the strength of
> the method that it can work without the need for absolute
> knowledge on that kind of question.

But that is not how Greenberg does it, and not how any proponent of MC I
have ever heard of does it.  All the MC people I have ever seen claim
absolute cognation, and they do this expressly and at times heatedly.
You ought to see the very rude mail I get from some of the MC people
when I challenge their comparisons.

> There is a second reason why the criticism is not definitive for
> Greenberg even in his most lax approaches.  We have no empirical
> studies of how fast the discovery of lookalikes which might be
> considered plausible potential cognates (for further work) will
> decrease, with increasing distance of languages from each other, nor
> with what proportion of a group of descendant languages will be
> likely to be included in such cognate sets.

Actually, we do have a few empirical studies of a lexicostatistical
nature, but these, for obvious reasons, cannot normally include a time
factor.  Some of the most interesting work of this sort I have seen is
still unpublished, but should be published within about a year.

> Greenberg's valid point is that, if we want to avoid MISSING any
> valid cognates

Sorry, but I don't think this is Greenberg's point at all, though it's
admittedly easy to interpret him like this.

Anyway, the point is *not* to avoid missing any valid cognates -- a
pointless and futile enterprise, in my view.  The point is to find
sufficient positive evidence for relatedness, over and above chance
resemblances, that the null hypothesis of unrelatedness cannot be
maintained.

> (that is, in my own words, if we want to be over-inclusive at first,
> and I personally would limit this to pioneer stages of classification
> at any given depth)
> then we recognize that among true cognates, they will be retained
> in a gradually decreasing proportion of related languages as we
> increase the time and changes separating those languages.
> THERE WILL PROBABLY BE some valid cognates retained
> in only two of some sample of 40 very distantly related but
> indeed genetically related languages.

No quarrel there, but why stop at two languages?  Distantly related
languages may in fact retain no cognates at all, a state of affairs
generally indistinguishable from unrelatedness.

Anyway, the point is not whether any true cognates survive, but whether
such sparse cognates can be distinguished from chance resemblances.
And that question requires rigorous mathematical methods.

> Because the severest critics of Greenberg complain that his methods
> are not formalizable,

No.  Nobody is complaining that G's methods are not formalizable: even
comparative reconstruction is not formalizable.

The problem is that G's methods are utterly *inexplicit*, and that they
provide no basis for distinguishing cognates from chance resemblances.

> we cannot then draw absolute conclusions
> about the relative roles of valid cognates preserved sporadically
> in only two or a few languages, vs. pseudo-cognates made possible
> by random noise.  Without complete formalization, it is impossible
> to draw absolute conclusions such as the following:

>> Greenberg's "Amerind" classification
>> never rises above the level of random noise.

> We simply don't know that.

Agreed, but it is *Greenberg's* responsibility to demonstrate that his
Amerind comparisons rise significantly above the level of chance.  And
he hasn't even attempted this.

G's critics can see no reason to believe that G's comparisons *do* rise
above the chance level, and they say so.  It's not their responsibility
to devote years of effort to demonstrating that the comparisons don't
rise above chance.

> To assume it is true, we would I think have to assume that
> pseudo-cognates are just as easy to find as true cognates,
> taken for the sample as a whole.  (Notice that this is stated
> as a RELATIVE COMPARISON OF EASE.)

My experience of MC work is that spurious "cognates" are a whole lot
easier to find than real cognates.  Hell, I've done it myself, with
Basque and Hungarian (among others), and Lyle Campbell has e-published a
beautiful demonstration that Old Japanese belongs to Amerind better than
any single American language.

> That assumption strikes me as counter-intuitive,

Really?  Try reading a few issues of Mother Tongue. ;-)

> as assuming that the small amount of
> information retained from the common proto-language is
> not there at all, because if ANY of it were still there,
> it should make at least some tiny degree more likely that we
> would find pairs of lookalikes which actually are cognates,
> than pairs which actually are not ("actually" in the light of
> some ideal complete knowledge of the distant future).

Sorry, but intuitions are not relevant.  You need hard-nosed statistical
tests, of a sort which are currently being developed, though we still
have a way to go.

> The core difficulty in discussions between Greenberg and
> critics is to me that the results of the Multilateral Comparison
> COMPONENT of the many methods Greenberg uses are
> taken to yield results of the kind claimed by the Comparative
> Method, when they do not at all.  They yield only
> RELATIVE COMPARISONS, not statements of relation vs.
> non-relation, of cognacy vs. non-cognacy, taken as absolutes.
> There are not even any absolute levels of confidence in the
> results, only relative ones.

That's not how Greenberg sees it, or Ruhlen, or Bengtson, or Fleming, or
any other proponent of MC I've come across.

> Multilateral Comparison yields results of the type:
> X and Y are likely to be more closely related than X and Z.
> It includes the converse, that X and Z are likely to be less closely
> related than X and Y (including possibly unrelated).
> That is all.

Regardless of whether this statement is true or not, it has nothing to
do with the results claimed by Greenberg and by other proponents of MC,
who certainly do claim proof of absolute relatedness -- in Ruhlen's
case, extending to all known languages.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk



More information about the Indo-european mailing list