Hypothesis formation vs. testing

Thu Aug 19 13:24:12 UTC 1999

There is a great confusion in the advanced sciences,
or even in those which like to believe themselves advanced,
between
Hypothesis Testing
and
Hypothesis Formation.

When we cannot conclusively test certain hypotheses,
it is still legitimate to try to accumulate evidence that the hypotheses
are plausible and worth exploring further.

In a message dated 8/18/99 11:32:41 PM, Larry Trask writes:

>On Thu, 12 Aug 1999 ECOLING at aol.com wrote:

>> No Burden of Proof is appropriate on the content of the question
>> whether all languages are ultimately related,
>> simply because we cannot test that question currently.

>I fully agree that the question `Are all languages related?' cannot be
>answered at present.  I further believe that we will never be able to
>answer this question by purely linguistic means.

>However, there are people who disagree, one of the most prominent being
>Merritt Ruhlen.  Ruhlen wishes to embrace the conclusion `All languages
>are related.'

As I have understood Joseph Greenberg's clearer and more cogent
statements, his own work actually does NOT propose to prove any such
conclusion.  It is rather an ASSUMPTION that all languages are or might
be related (i.e. we are not to exclude that).
Greenberg's method of comparison serves
to find the CLOSEST resemblances (merely that, CLOSEST).
In the Americas, his method leads him to the conclusion (no surprise)
that Eskimo-Aliut is not closely related to any other Amerindian languages,
and that Athabaskan / Na-Dene (with outliers)
is not closely related to any other Amerindian languages,
(though conceivably not as distant from them as Eskimo-Aliut ?).

His actual conclusions are about relative UNRELATEDNESS of language
families (notice, not about absolute unrelatedness, which he does not claim
his method has the power to evaluate).
Beyond that, Greenberg's methods do NOT enable him to establish
any similar degree of unrelatedness among the remaining languages of
the Americas.
I hope I have stated that carefully enough, to make obvious that it is
a matter of degree, not absolutes, and that Greenberg's method actually
demonstrates the points of SEPARATION rather than the points of UNION.

Greenberg's method is potentially useful in that it is likely to reveal
some deep language family relationships
which were not previously suspected,
AS LONG AS we do not introduce systematic biases
which overpower whatever residual similarities still exist
despite all of the changes which obscure those deep relationships.
In other words, mere noise in the data, or dirty data,
if the noise or dirt are random, should not be expected to selectively
bias our judgements of closeness of resemblance...
[and we can study how we make such judgements to try to strengthen
this component of Greenberg's method, to strengthen their robustness
against noisy data and our mental failings of judgement]

Back to Trask:

>Now, in order to go about this, I maintain, [Ruhlen] should start with the
>negation of this statement as his null hypothesis, and then go on to
>show that there is so much evidence against this null hypothesis that it
>is untenable and must be rejected.  But that's not what he does.

The last paragraph above is in complete contradiction to what Larry Trask
says he agrees with ("I fully agree"...).
If one believes it is not possible to test a proposition,
then it is NOT REASONABLE to ask anyone else to test it.
One cannot have this both ways.

>Instead, he *starts* with the hypothesis `All languages are related',
>and then proceeds to assemble what he sees as evidence in support of
>this last hypothesis.  Amazingly enough [;-)]. he is able to find such
>evidence.

So far, this is legitimate in principle [but on practice, see below]
IF the purpose is to establish the plausibility of a hypothesis
(as distinct from testing it, NOTICE!).
This is how almost all hypotheses are first established as hypotheses,
simply by accumulating suggestive, anecdotal, case-study evidence,
in contexts in which we do not even know how to estimate chance
very well.

>He therefore declares that, because he has found evidence in
>support of his desired conclusion, it must be true.  But this is
>completely wrongheaded.

Here I agree with Trask, to the extent Ruhlen says something like this.
(I am much less familiar with Ruhlen than with Greenberg.)

>What Ruhlen *must* do, if he wants to persuade anybody, is not to try to
>demonstrate that his favored conclusion is supported by evidence, but
>rather that its contradictory -- the appropriate null hypothesis -- is
>so strongly disconfirmed that it cannot be maintained.

The contradictory of the strong claim (all related) is that there are at least
two languages which are not related to each other genetically.
I would doubt that Ruhlen had evidence to exclude this possibility,
or that if asked clearly, he would say so.  After all (trivially) there are
languages for which there are only one or two words attested,
and one can go on from there with very little work to find other cases
where I think Ruhlen would grant there is not even a loose probability
based on the data itself to establish any relationship.

[Trask's example All Swans are White not repeated here, but ...]

>This fundamental failure to understand proper methodology is enough to
>render Ruhlen's work vacuous,

Not so, since Ruhlen can be treated as involved in hypothesis FORMATION
not hypothesis testing.

>quite apart from the vast number of
>egregious errors in the material he cites as evidence,

Now THAT is quite another matter, and when present in very large
quantity, not merely slight differences from the analysis an expert in
a particular language would offer but more serious, complete
misunderstandings vitiating completely any use of particular data...
it does discredit the work as a whole, and can quite legitimately,
even without absolute proof of its wrong-headedness,
lead reasonable people to pay no more attention to it.
But note carefully the caveat above.  It is NOT sufficient merely to
provide minor improvements of detail to the presentation,
to discredit the work.  An expert can ALWAYS provide minor
improvements.  That itself shows nothing at all.

>and quite apart
>from his failure to realize that lookalikes do not constitute evidence
>of any kind.

Disagree flatly, unless defined circularly so that "lookalikes" means
more than it says, namely so that it means
"lookalikes which are known to be unrelated as cognates".

If it actually means "items which look alike in sound and meaning",
then of course such comparisons DO constitute PRELIMINARY evidence.
Any such preliminary evidence can be discounted by showing
that the resemblances are secondary and late,
or that they manifest a type of sound symbolism,
or in other ways.
It was lookalikes in grammar and vocabulary which led to
the original hypothesis of the relatedness of the Indo-European
languages.  Some of these turned out to be true cognates,
some turned out not to be cognates, merely chance lookalikes.
But the IE hypothesis thus preliminarily established withstood the
discounting of some of the lookalikes as non-cognates
and the reaffirmation of others a true cognates (whatever the
terminology used at the time).

Once again, I wish to urge us back to the FACTS.
And those FACTS include whatever we can establish about how
each of our tools works, where it works well and where it fails,
how deep historically each tool can push us with languages
of certain types or with language changes of certain types,
and whatever we can establish about new tools we have not yet
systematically used (such as explicit paths of historical
change in sound systems and in semantic spaces, and metrics
of distances along such paths of change...).

We get nowhere by repeating the discrediting of STRAW MAN
claims, by holding hypothesis formation to standards of absolute
hypothesis testing, by counting minor corrections and improvements
to data as completely discrediting use of the data when they do not,
etc. etc. and so forth.

The field is at an impasse in these discussions,
until we return the discussion to an empirical basis.
Pure philosophy will not get us much progress.

Sincere best wishes,
Lloyd Anderson
Ecological Linguistics