Retained Information vs. Random Noise

ECOLING at aol.com ECOLING at aol.com
Wed Sep 8 13:52:47 UTC 1999


John McLaughlin's contribution received today confirms
the interpretation I was making of his presentation.

That is, one gets the increasing number of PAIRWISE
matches between any pair of languages, as one increases
the number of languages involved,

IF ONE DOES NOT CARE WHAT PROPORTION
of the many languages are represented in each purported
potential cognate set.

As soon as one does care about this, the reasoning used
by McLaughlin becomes merely a limiting case,
in the event that one does not have any serious controls
(as I previously stated).

This is a legitimate criticism of some of Greenberg's
work.

But it is not a criticism of Multilateral Comparison,
although McLaughlin presents it that way.

The most basic Multilateral Comparison, like the work
under Catherine the Great, and like the small table in
Greenberg's theoretical intro section to Language in the Americas,
has all cells filled, a word is included for each language in the
sample for the semantic category defined.  As a result,
the only judgements being made are about closer or less close.
The judgements are NOT about whether a given pairing of words
is a cognate or not in any absolute sense.  It is the strength of
the method that it can work without the need for absolute
knowledge on that kind of question.

There is a second reason why the criticism is not definitive for
Greenberg even in his most lax approaches.
We have no empirical studies of how fast the discovery of lookalikes
which might be considered plausible potential cognates (for further work)
will decrease, with increasing distance of languages from each other,
nor with what proportion of a group of descendant languages will be
likely to be included in such cognate sets.

Greenberg's valid point is that, if we want to avoid MISSING any
valid cognates
(that is, in my own words, if we want to be over-inclusive at first,
and I personally would limit this to pioneer stages of classification
at any given depth)
then we recognize that among true cognates, they will be retained
in a gradually decreasing proportion of related languages as we
increase the time and changes separating those languages.
THERE WILL PROBABLY BE some valid cognates retained
in only two of some sample of 40 very distantly related but
indeed genetically related languages.

Because the severest critics of Greenberg complain that his methods
are not formalizable, we cannot then draw absolute conclusions
about the relative roles of valid cognates preserved sporadically
in only two or a few languages, vs. pseudo-cognates made possible
by random noise.  Without complete formalization, it is impossible
to draw absolute conclusions such as the following:

>Greenberg's "Amerind" classification
>never rises above the level of random noise.

We simply don't know that.
To assume it is true, we would I think have to assume that
pseudo-cognates are just as easy to find as true cognates,
taken for the sample as a whole.  (Notice that this is stated
as a RELATIVE COMPARISON OF EASE.)

That assumption strikes me as counter-intuitive,
as assuming that the small amount of
information retained from the common proto-language is
not there at all, because if ANY of it were still there,
it should make at least some tiny degree more likely that we
would find pairs of lookalikes which actually are cognates,
than pairs which actually are not ("actually" in the light of
some ideal complete knowledge of the distant future).

The core difficulty in discussions between Greenberg and
critics is to me that the results of the Multilateral Comparison
COMPONENT of the many methods Greenberg uses are
taken to yield results of the kind claimed by the Comparative
Method, when they do not at all.  They yield only
RELATIVE COMPARISONS, not statements of relation vs.
non-relation, of cognacy vs. non-cognacy, taken as absolutes.
There are not even any absolute levels of confidence in the
results, only relative ones.

Multilateral Comparison yields results of the type:
X and Y are likely to be more closely related than X and Z.
It includes the converse, that X and Z are likely to be less closely
related than X and Y (including possibly unrelated).
That is all.

Comparative Method yields results of the type:
X and Y are related, and these [stated]
are the sound correspondences which were involved.

I must emphasize again, Greenberg's work is not a definition
of Multilateral Comparison.  Nor is Greenberg's actual
accomplishment necessarily the same as what even he says it is.
Just as for many other writers and researchers.

Multilateral Comparison existed before Greenberg and
will exist after him, and has in fact been successfully used by
pioneers in language comparison and reconstruction world-wide,
in doing *triage* to select which set of languages to study
more intensively, often using the Comparative Method.

Best wishes,
Lloyd Anderson
Ecological Linguistics



More information about the Indo-european mailing list