Hypothesis formation vs. testing

Sun Aug 22 15:43:40 UTC 1999

On Thu, 19 Aug 1999 ECOLING at aol.com wrote:

>There is a great confusion in the advanced sciences,
>or even in those which like to believe themselves advanced,
>between
>Hypothesis Testing
>and
>Hypothesis Formation.

And part of this confusion arises from not knowing what
hypothesis testing (or possibly a hypothesis) is.  Because in
trying to define "null hypothesis" on Sat, 7 Aug 1999
ECOLING at aol.com wrote:

     I respectfully disagree with the person who recently argued
     between the following two alternatives that either could be
     an appropriate "null hypothesis" (...)

     >   No languages are related.
     ><snip>
     >   All languages are related.

     Rather, the real null hypothesis is something like

     "We do not know whether all languages are related
     (or whether there was polygenesis)" (etc.)

     Any attempt to force anyone else to accept something that
     embodies a CLAIM (...), is MANIPULATING the discourse,
     instead of dealing with facts.

which shows a spectacular lack of knowledge of what a hypothesis
is.  It was subsequently pointed out that "We do not know whether
all languages are related" is not a hypothesis, to which
ECOLING at aol.com replied on Thu, 12 Aug 1999:

     There may be more agreement than seems to be the case in
     this matter.  In particular I would agree with the
     following:

     >"We don't know" won't suit for those
     >purposes, because it is not a hypothesis.

     is very similar to my position, that "We cannot currently
     know" is the null hypothesis, and "We can currently know" is
     the hypothesis to be tested.  And, quite obviously, no one
     can succeed in establishing that "we do know" is valid, with
     current data and tools, applied to the question whether all
     languages are ULTIMATELY related.
     So we know the answer to THAT test, for now and for a long
     time to come.

which simply ignored the point at issue.  "We don't know" is not
a hypothesis, it is a fact.  So Lloyd has restated it as what he
thought was a hypothesis (which by his own definition means that
he is trying to MANIPULATE the discourse by making a CLAIM :>),
but this is not really a hypothesis either.  There is no
particular semantic difference between "we don't know" and "we
can't tell" so this is just another fact, not a hypothesis and is
not at all suited to be a null hypothesis.

Null hypothesis is a technical term in hypothesis testing and
here is a definition of null hypothesis taken from the glossary
of a textbook on statistical methods available on the web
(http://www.stat.Berkeley.EDU/users/stark/SticiGui/Text/gloss.htm
#null_hypothesis):

  Null hypothesis.

    In hypothesis testing, the hypothesis we wish to falsify on
    the basis of the data.  The null hypothesis is typically that
    something is not present, that there is no effect, or that
    there is no difference between treatment and control.

So "We can't tell" is no more appropriate as a null hypothesis
than "We don't know" is.  A fact can be either true or false, but
that still doesn't make it a hypothesis.

The null hypothesis is the outgrowth of Karl Popper's observation
that since it is possible to disprove an inductive hypothesis by
counter evidence, but impossible to "prove" an inductive
hypothesis by simply amassing more evidence in favor of it, it
is often more effective to establish a hypothesis by disproving
some rival hypothesis.  In the days before hypothesis testing
became a science, the null hypothesis would have been called a
straw-man argument -- something that is put forward simply to
demolish so that its destruction makes the real argument more
likely.

But a scientific hypothesis must make a CLAIM, because that's
what a scientific hypothesis is:  a proposed explanation for some
group of observed data (facts).  Here are some definitions of
hypothesis taken from the web:

     http://www.writedesignonline.com/organizers/hypothesize.html

     hypothesis - 1. a proposition, or set of propositions, set
     forth as an explanation for the occurrence of some specified
     group of phenomena, either asserted merely as a provisional
     conjecture to guide investigating (working hypothesis) or
     accepted as highly probable in the light of established
     facts.  2. a proposition assumed as a premise in an
     argument. 3. the antecedent of a conditional proposition.
     4. a mere assumption or guess.

So it can be seen that a scientific hypothesis (definition 1
above) CLAIMs to explain some specified group of phenomena.  This
is not an attempt to MANIPULATE the discourse.  This *is* dealing
with facts.  And it is the responsibility of the proponent of a
hypothesis to demonstrate its validity, either by an overwhelming
amount of evidence in its favor or by falsifying all rival
hypotheses.  If there is no rival hypothesis that is capable of
being disproved, then the original hypothesis is said to be
non-falsifiable and falls outside the realm of science.  It is a
"just-so" story (fairy tale) and this is why "all languages are
related" is inappropriate as a null hypothesis (although not
necessarily as a working hypothesis); it is not falsifiable.  If
there is no evidence offered that the original hypothesis is the
true explanation of the observed facts then it is not a
scientific hypothesis it is merely a hypothesis according to
definition 4 above.

>When we cannot conclusively test certain hypotheses, it is still
>legitimate to try to accumulate evidence that the hypotheses are
>plausible and worth exploring further.

Certainly, UFOologists and seekers after Atlantis and Noah's Ark
are doing this all the time.  And who knows, someday they may be
successful.  But I would feel a lot more secure with this
statement if ECOLING at aol.com had not continued his message of
Sat, 7 Aug 1999 with:

     If we don't know, then we don't know, it's as simple as
     that. I personally don't give a hoot what anyone wants to
     "assume", or tell me to assume, in the absence of data
     justifying such an assumption.

Okay, I get the message.  You don't give a hoot about what anyone
else assumes but you don't mind telling everybody else what to
assume.  It's not a question of data.  Presumably everybody has
the same data.  The problem is that for any set of data there are
an infinite number of hypotheses that can account for the data.

And then:

     Using a "burden of proof" argument is merely a way of trying
     to get someone to accept a conclusion in the absence of
     evidence.

No, using a burden of proof argument is a way of trying to get
someone to provide evidence to support his conclusion or CLAIM.
An unsupported hypothesis (no proof offered) is "a mere
assumption or guess" or a "provisional conjecture to guide
investigation".  The burden of proof is always with the one who
makes a CLAIM to have an explanation for some observed facts that
is different from the explanation that does not require proof
(the null hypothesis).  In American justice the null hypothesis
is "innocent" (used to be, anyway).  It is up to the prosecution
to prove "beyond reasonable doubt" any charge brought.  It is
not, as in some countries, up to the defendant to prove his
innocence.  Innocence is assumed (or is supposed to be) until
guilt is proved.  Similarly, in comparative linguistics, given
two similar sounding words with similar meanings in different
languages, the null hypothesis is "coincidence".  Coincidence is
always possible and hence does not have to be proved.  Anyone who
wants to CLAIM that these words show a relationship between the
two languages has to offer proof.

     Only facts are relevant, facts which could make one
     conclusion more probable than another, (facts which DO NOT
     have anything to do with our own mental convenience, not
     EVEN with assumptions that nature is simple in some way we
     mentally want her to be, when she may in this particular
     respect NOT be simple).

But it is not only the facts that are relevant ("just the facts,
ma'am" :>).  The data are presumably the same for everyone.  But
a fact is not data.  It is an observation about data.  Two people
looking at the same data might come up with different facts.

Scientific discourse involves two separate areas (and they are
not hypothesis formation and hypothesis testing):

     (1) factual statements about data, which rest on
     observations and which are either true or false, and (2) a
     hypothesis, a statement put forward in explanation of the
     facts.  A fact is an empirically verifiable statement about
     phenomena in terms of a conceptual scheme; a fact is not an
     object in nature but a statement about nature.  The
     hypothesis must be formulated so that it can be shown to
     be either inadequate [falsifiable] or substantiated to a
     high degree of probability by further facts.  A single
     contrary case may disprove the hypothesis, although it need
     not.  The hypothesis must be based on the facts available,
     and the facts should not be made to fit the hypothesis.  The
     problem is that even the establishment of facts can be
     highly controversial in genetic linguistics, as in other
     sciences.

     Raimo Anttila, _Historical and Comparative Linguistics_, p. 23

I have chosen to quote Anttila here because he has a very pithy
summary, and from this I can make an even starker summary:  Facts
are observations about data.  The data are the same for everyone
(exist in nature); facts may not be the same for everyone.  A
hypothesis is an attempt to explain the observed facts.

As Anttila states, there are two things that make a sound
hypothesis:  It can be shown to be inadequate (i.e., it has a
valid test for falsification) and it can be substantiated to a
high degree of probability by further facts (facts that were not
part of the basis of the original hypothesis).  Saying that a
hypothesis is valid because it accounts for the original facts is
not a test of a hypothesis.  It is simply circular reasoning (the
hypothesis is created to account for the facts; the hypothesis
must therefore be correct because it accounts for the facts).
For any set of facts, there are an infinite number of hypotheses
that will account for them.  Some will just be more believable
than others.

>In a message dated 8/18/99 11:32:41 PM, Larry Trask writes:

>>On Thu, 12 Aug 1999 ECOLING at aol.com wrote:

>>> No Burden of Proof is appropriate on the content of the
>>> question whether all languages are ultimately related, simply
>>> because we cannot test that question currently.

>>I fully agree that the question `Are all languages related?'
>>cannot be answered at present.  I further believe that we will
>>never be able to answer this question by purely linguistic means.

>>However, there are people who disagree, one of the most
>>prominent being Merritt Ruhlen.  Ruhlen wishes to embrace the
>>conclusion `All languages are related.'

>As I have understood Joseph Greenberg's clearer and more cogent
>statements, his own work actually does NOT propose to prove any such
>conclusion.  It is rather an ASSUMPTION that all languages are or might
>be related (i.e. we are not to exclude that).

Yes, Greenberg is clearer and more cogent than Ruhlen.  And it is
proper methodology to assume something that cannot be proved to
see where it leads.

<snip>

>Back to Trask:

>>Now, in order to go about this, I maintain, [Ruhlen] should
>>start with the negation of this statement as his null hypothesis,
>>and then go on to show that there is so much evidence against
>>this null hypothesis that it is untenable and must be rejected.
>>But that's not what he does.

>The last paragraph above is in complete contradiction to what
>Larry Trask says he agrees with ("I fully agree"...).
>If one believes it is not possible to test a proposition, then it
>is NOT REASONABLE to ask anyone else to test it.
>One cannot have this both ways.

You are comparing a statement about what is possible and one
about what is proper methodology and saying that one contradicts
the other and blaming this on Larry Trask.  This is simply an
attempt at killing the messenger.  If proper methodology requires
that an investigator perform an impossible act and the
investigator chooses to ignore this, then this speaks to the
investigator's methodology, not to the methodology of the person
who points this out.  It is NOT REASONABLE to ask the
investigator to perform an impossible act.  But it is REASONABLE
to point out that omission of this act makes the investigation
methodologically suspect.  If the investigator cannot perform a
methodologically necessary act then he should not speak and act
as if it were not necessary to perform the act.  One cannot have
this both ways.

>>Instead, he *starts* with the hypothesis `All languages are
>>related', and then proceeds to assemble what he sees as evidence
>>in support of this last hypothesis.  Amazingly enough [;-)]. he
>>is able to find such evidence.

>So far, this is legitimate in principle [but on practice, see below]
>IF the purpose is to establish the plausibility of a hypothesis
>(as distinct from testing it, NOTICE!).

No, this is not even legitimate in principle.  A hypothesis is
something put forward as an explanation for observed facts.  It
is not something that one proposes and then goes out and looks
for facts to support.  One creates a hypothesis to account for
facts (all the facts).  One does not select and arrange the facts
to fit the hypothesis.

>This is how almost all hypotheses are first established as
>hypotheses, simply by accumulating suggestive, anecdotal,
>case-study evidence, in contexts in which we do not even know how
>to estimate chance very well.

It is true that many (most?) scientific hypotheses start as
guesses.  But not every guess is a scientific hypothesis.  While
anything is possible, not everything is probable.  Basically, no
evidence, no scientific hypothesis.

And whatever happened to your:

     Only facts are relevant, facts which could make one
     conclusion more probable than another, (facts which DO NOT
     have anything to do with our own mental convenience, not
     EVEN with assumptions that nature is simple in some way we
     mentally want her to be, when she may in this particular
     respect NOT be simple).

>>He therefore declares that, because he has found evidence in
>>support of his desired conclusion, it must be true.  But this is
>>completely wrongheaded.

>Here I agree with Trask, to the extent Ruhlen says something like
>this.

Then what is the problem?  Why this diatribe about the difference
between hypothesis testing and hypothesis formation?  If creating
a hypothesis ("a mere guess or assumption" or "a provisional
conjecture to guide investigating") and then (selectively)
collecting evidence to support it and then claiming that the
hypothesis is proved because there is evidence to support it is
the reverse of proper scientific methodology, then what makes the
selection and arrangement of the facts collected for this purpose
the formation of a hypothesis?  Hypothesis formation is not the
collecting of data.  Hypothesis formation is an attempt to
account for the data.  If the hypothesis is formulated in the
absence of data, collecting data to support the hypothesis is not
hypothesis formation, it is just bad science.  If one tries to
say "isn't it lucky that the evidence collected already has a
hypothesis to explain it" this is simply ignoring the fact that
there are an infinite number of hypotheses to account for any
collection of facts.

>(I am much less familiar with Ruhlen than with Greenberg.)

Then you should perhaps familiarize yourself with his methodology
before you try to defend it.

>>What Ruhlen *must* do, if he wants to persuade anybody, is not
>>to try to demonstrate that his favored conclusion is supported by
>>evidence, but rather that its contradictory -- the appropriate
>>null hypothesis -- is so strongly disconfirmed that it cannot be
>>maintained.

>The contradictory of the strong claim (all related) is that there
>are at least two languages which are not related to each other
>genetically.
>I would doubt that Ruhlen had evidence to exclude this
>possibility, or that if asked clearly, he would say so.  After
>all (trivially) there are languages for which there are only one
>or two words attested, and one can go on from there with very
>little work to find other cases where I think Ruhlen would grant
>there is not even a loose probability based on the data itself to
>establish any relationship.

Ruhlen would grant nothing of the kind.  The relatedness of all
languages is not an issue for Ruhlen.  The following is Ruhlen's
position on the matter (this quotation was posted by Larry Trask
to another list; I have not verified it [our library has none of
Ruhlen's books], but I have no reason to doubt the accuracy of
the quotation):

  First, the search for linguistic `relationships' is now over
  (or should be), since it no longer makes sense to ask if two
  languages (or two language families) are related.  *Everything*
  is related, and the question to be investigated within or among
  different families is the *degree* of their relationship, not
  the fact of it.  [emphasis in the original]

  Merritt Ruhlen (1994), On the Origin of Languages, Stanford UP,
  p. 272.

>[Trask's example All Swans are White not repeated here, but ...]

>>This fundamental failure to understand proper methodology is
>>enough to render Ruhlen's work vacuous,

>Not so, since Ruhlen can be treated as involved in hypothesis
>FORMATION not hypothesis testing.

On the contrary, Ruhlen is not formulating a hypothesis (we all
should know how to do this by now:  one starts with the data and
then develops a hypothesis that accounts for the data [all the
data]).  What he is doing is precisely what you railed against
when you said:

     Any attempt to force anyone else to accept something that
     embodies a CLAIM (...), is MANIPULATING the discourse,
     instead of dealing with facts.

And if you really meant it when you said:

     If we don't know, then we don't know, it's as simple as
     that. I personally don't give a hoot what anyone wants to
     "assume", or tell me to assume, in the absence of data
     justifying such an assumption.

Are you now saying that it's all right to "assume" so long as you
can find some data that justify the assumption (while ignoring
data that don't)?  I really would not expect you to claim this.

If Ruhlen can be considered to be involved in hypothesis
FORMATION not hypothesis testing, then the hypothesis being
formed is simply a hypothesis according to definition 4 above
("a mere guess or assumption"), not a hypothesis according to
definition 1 (a scientific hypothesis).  Collecting data is not
hypothesis formation; it is a prerequisite to hypothesis
formation.  And the data on which a hypothesis is based cannot be
used to test the hypothesis because that is simply circular.

>>quite apart from the vast number of egregious errors in the
>>material he cites as evidence,

>Now THAT is quite another matter, and when present in very large
>quantity, not merely slight differences from the analysis an expert in
>a particular language would offer but more serious, complete
>misunderstandings vitiating completely any use of particular data...
>it does discredit the work as a whole, and can quite legitimately,
>even without absolute proof of its wrong-headedness,
>lead reasonable people to pay no more attention to it.
>But note carefully the caveat above.  It is NOT sufficient merely to
>provide minor improvements of detail to the presentation,
>to discredit the work.  An expert can ALWAYS provide minor
>improvements.  That itself shows nothing at all.

>>and quite apart from his failure to realize that lookalikes do
>>not constitute evidence of any kind.

>Disagree flatly, unless defined circularly so that "lookalikes"
>means more than it says, namely so that it means "lookalikes
>which are known to be unrelated as cognates".

You can disagree as much as you want, but that won't change
reality.  Lookalikes are not evidence of anything except the
trivial fact that the lookalikes exist.  It requires evidence to
show that lookalikes demonstrate a relationship between the
languages in which they occur.  Lookalikes can always be
coincidence, even in languages that are known to be related.  So
without evidence to the contrary, lookalikes are coincidence just
as an individual is innocent until proved guilty.  It is the null
hypothesis.  The lookalikes may be a fact, but a fact is not a
hypothesis (I hope this is clear by now).  Some hypothesis must
be presented to account for their presence (and the hypothesis
that "all languages are related" is not adequate because it is
non-falsifiable; one might as well account for them by saying
"Santa Claus left them there").

>If it actually means "items which look alike in sound and
>meaning", then of course such comparisons DO constitute
>PRELIMINARY evidence. Any such preliminary evidence can be
>discounted by showing that the resemblances are secondary and
>late, or that they manifest a type of sound symbolism, or in
>other ways.

To the extent that lookalikes are a fact and facts can be used as
evidence, this is true.  But the question is and remains, what
are they evidence of.  The null hypothesis is that they are
evidence that similar sounding words with similar meanings occur
in the world's languages by pure chance.  That they are evidence
of anything else must be proved.  So your explication is exactly
backwards.  Attempts to prove that the existence of the
lookalikes demonstrates a relationship between the languages can
be discounted by the means you point out, but the lookalikes
still remain evidence (of the fact that chance resemblances do
occur).

>It was lookalikes in grammar and vocabulary which led to the
>original hypothesis of the relatedness of the Indo-European
>languages.  Some of these turned out to be true cognates, some
>turned out not to be cognates, merely chance lookalikes.
>But the IE hypothesis thus preliminarily established withstood
>the discounting of some of the lookalikes as non-cognates and the
>reaffirmation of others a true cognates (whatever the terminology
>used at the time).

This is quite true, and it was quite clearly pointed out in the
earliest identification of IE languages that the null hypothesis
was disproved ("a stronger affinity, ..., than could possibly
have been produced by accident").  The rest, as you say, has just
been details.

>Once again, I wish to urge us back to the FACTS.

And this is a good idea.  But FACTS are not hypotheses.  FACTS
are not even data.  FACTS are observations based on data.  FACTS
can be true or false.  It is, however, important to remember that
hypotheses are attempts to account for FACTS (all the FACTS, not
just selected FACTS) and that hypotheses must be judged by how
well they account for the FACTS and that hypotheses must include
a CLAIM (that the hypothesis correctly accounts for the FACTS).

>And those FACTS include whatever we can establish about how
>each of our tools works, where it works well and where it fails,
>how deep historically each tool can push us with languages
>of certain types or with language changes of certain types,
>and whatever we can establish about new tools we have not yet
>systematically used (such as explicit paths of historical
>change in sound systems and in semantic spaces, and metrics
>of distances along such paths of change...).

In essence, we really only have one tool.  It is called the
scientific method.  The scientific method is extremely simple:
1) a problem is identified, 2) relevant data is collected, 3) a
hypothesis is formulated to account for the data, 4) the
hypothesis is empirically tested.  Steps 1 and 2 do not have to
occur in this order.  The problem may emerge during the
collecting of data or may become apparent only after data has
been collected for some other reason.  But step 3 should be
preceded by 1 and 2 in whatever order they may come.  Formulating
a hypothesis and then collecting data to support it is not part
of the scientific method.  It is known as "speculating in advance
of the evidence" or "counting chickens before they hatch".
Finally, the hypothesis should have some rival hypothesis that
can be falsified before it can be considered scientific.

All the other tools that we have are hypotheses that have logical
consequents that can be empirically tested.  Their value lies in
the extent to which they are scientific hypotheses and are in
accord with the scientific method.  Otherwise we might as well
try to solve the problems of historical linguistics by gazing
into crystal balls.  I will not go into the tools of historical
linguistics at this point because this is long enough already and
an opportunity to discuss these tools and the pitfalls in using
them will doubtless occur soon.

>We get nowhere by repeating the discrediting of STRAW MAN claims,
>by holding hypothesis formation to standards of absolute
>hypothesis testing, by counting minor corrections and
>improvements to data as completely discrediting use of the data
>when they do not, etc. etc. and so forth.

It is not that we get nowhere; we just don't get where you want
to go.  Discrediting STRAW MAN claims (= null hypothesis) is part
of the method, as is requiring a falsifiable null hypothesis as
well as requiring that a hypothesis have a test for
falsification.  These things tell us whether a hypothesis is
scientific or not.  And although hypothesis formation and
hypothesis testing are two different things, they are not the
different things that you have made them out to be.  The FACTS
that are used in hypothesis formation cannot be used in
hypothesis testing or else the proof of the hypothesis rests on
circular reasoning.

>The field is at an impasse in these discussions, until we return
>the discussion to an empirical basis.

This is, as the Monty Python people would say, the nub of the
gist.  Historical linguistics suffers from (and has apparently
always suffered from) a surfeit of "bright ideas" and a dearth of
hard data.  Thus the 19th century (and indeed, much of the 20th)
saw a proliferation of "bright ideas" (stadialism, the search for
the "original language", primitive languages are spoken by
primitive peoples, primitive languages are lexically poor and
grammatically impoverished) most of which had more to do with
nationalistic fervor and ethnic superiority (those same things
that brought us colonialism and the race for resources and
markets, WWI, WW2, and the more recent bouts of "ethnic
cleansing") than with linguistic actuality and which have
subsequently been discredited by the simple accretion of more
data about more different languages.  Most of these "bright
ideas" were just speculation in advance of the evidence (or
hypothesis FORMATION without FACTS).

Only for IE (and to a lesser extent Semitic) is there sufficient
hard data.  This is because these families (or branches) have
written records going back a long way which make histories of the
languages possible that give us fixed points along the road and
because these areas have been the subject of linguistic study for
centuries.  Other areas do not have the same advantages or the
same collections of hard data, primarily because they have not
been studied as long.

Many people (particularly those who are not historical linguists)
point to the inadequacy of the tools used in historical
linguistics.  Although to me this sounds like a version of "the
poor workman blames his tools", the primary problem of historical
linguistics is not the inadequacy of the tools, but a lack of
hard data to use the tools on.  When the data is available, the
tools to investigate it will arise.

So I completely agree that linguists should primarily be engaged
in the collection of empirical data.  They should be out in the
field collecting hard data before it disappears rather than
sitting around thinking up yet more "bright ideas" that will have
to wait for more data before it can be seen whether they
accurately reflect the real linguistic world or not.

Unfortunately, not all linguists are suited by temperament or
training to be field linguists.  So they have to have something
to do while the rest are out collecting data.  Perhaps they could
be better employed by putting the data already available into
more usable form rather than speculating about what the data
being collected will reveal.  But speculating without evidence is
more fun than writing grammars or dictionaries.

>Pure philosophy will not get us much progress.

If by pure philosophy you mean hypothesis FORMATION without FACTS
then I agree.  If you mean pointing out shoddy methodology, then
I disagree.  The scientific method is pure philosophy.
Epistemology is a branch of philosophy, not a science (despite
the -ology on the end).  So it is pure philosophy that will set
the tone for linguistics and determine whether it is seen as a
scientific discipline or an attempt to prove causality by
correlation like astrology (also not a science despite the -ology
on the end).  But I'm willing to have a moratorium on hypotheses
without data.  How about you?

Bob Whiting
whiting at cc.helsinki.fi