Hypothesis formation vs. testing

Tue Aug 24 14:07:44 UTC 1999

On Thu, 19 Aug 1999 ECOLING at aol.com wrote:

[LT]

>> I fully agree that the question `Are all languages related?' cannot be
>> answered at present.  I further believe that we will never be able to
>> answer this question by purely linguistic means.

>> However, there are people who disagree, one of the most prominent being
>> Merritt Ruhlen.  Ruhlen wishes to embrace the conclusion `All languages
>> are related.'

> As I have understood Joseph Greenberg's clearer and more cogent
> statements, his own work actually does NOT propose to prove any such
> conclusion.  It is rather an ASSUMPTION that all languages are or might
> be related (i.e. we are not to exclude that).

The assumption that all languages are related is out of order.

The assumption that all languages *might* be related is hardly an
assumption at all, and in any case such an idea is excluded by no one.

> Greenberg's method of comparison serves to find the CLOSEST
> resemblances (merely that, CLOSEST). In the Americas, his method
> leads him to the conclusion (no surprise) that Eskimo-Aliut is not
> closely related to any other Amerindian languages, and that
> Athabaskan / Na-Dene (with outliers)  is not closely related to any
> other Amerindian languages, (though conceivably not as distant from
> them as Eskimo-Aliut ?).

This account of Greenberg's work makes it appear to resemble certain far
more rigorous work in progress elsewhere, such as at Cambridge
University.  The Cambridge group are working with a variety of
algorithms which can, in principle, determine degree of closeness, and
which can hence produce unrooted trees illustrating relative linguistic
distance.  But these algorithms are utterly incapable of distinguishing
relatedness and unrelatedness.  If, for example, you run one of the
algorithms with a bunch of IE languages plus Basque and Chinese, the
result is a tree showing Basque and Chinese as the most divergent
members -- that's all.

If the same is true of Greenberg's highly informal approach, then G
cannot distinguish relatedness from unrelatedness, and he has no
business setting up imaginary "families".

> His actual conclusions are about relative UNRELATEDNESS of language
> families (notice, not about absolute unrelatedness, which he does
> not claim his method has the power to evaluate).

Just as well.  It is logically impossible to prove absolute
unrelatedness, and G would be mad to undertake such a thing.

> Beyond that, Greenberg's methods do NOT enable him to establish any
> similar degree of unrelatedness among the remaining languages of the
> Americas.

> I hope I have stated that carefully enough, to make obvious that it
> is a matter of degree, not absolutes, and that Greenberg's method
> actually demonstrates the points of SEPARATION rather than the
> points of UNION.

Fine, but then G's methods do not suffice to set up language families --
even though that is exactly what he does.

> Greenberg's method is potentially useful in that it is likely to
> reveal some deep language family relationships which were not
> previously suspected,

I said exactly this on page 389 of my textbook.

> AS LONG AS we do not introduce systematic biases
> which overpower whatever residual similarities still exist
> despite all of the changes which obscure those deep relationships.

I'd be interested to know just what `systematic biases' you have in
mind.

> In other words, mere noise in the data, or dirty data,
> if the noise or dirt are random, should not be expected to selectively
> bias our judgements of closeness of resemblance...

No.  I can't agree.

Suppose two languages A and B are genuinely but distantly related.
In this case, it is at least conceivable that false positives (spurious
matches) would be counterbalanced by false negatives (the overlooking of
genuine evidence).

But suppose the two languages are not in fact related at all.  In this
case, false negatives cannot exist, because there is no genuine evidence
to be overlooked.  Hence the only possible errors are false positives:
spurious evidence.  And the great danger is that the accumulation of
false positives will lead to the positing of spurious relationships.
Many of G's critics have hammered him precisely on this point.

> [and we can study how we make such judgements to try to strengthen
> this component of Greenberg's method, to strengthen their robustness
> against noisy data and our mental failings of judgement]

In their present form, G's methods appear to me to have no robustness at
all.  Words are similar if Greenberg says they are.  And languages are
related if Greenberg judges that he has found enough similarities
between them.  There are no objective criteria or procedures at all, and
there is no possibility that anyone else could replicate G's work.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk