STATISTICS IN LINGUISTICS

Tue Jan 26 15:11:12 UTC 1999

[ moderator re-formatted ]

Dear John and IEists:

-----Original Message-----
From: Dr. John E. McLaughlin and Michelle R. Sutton <mclasutt at brigham.net>
Date: Tuesday, January 26, 1999 12:03 AM

>Patrick C. Ryan wrote:

>> And that, appreciatedly candid Claire, is what invalidates linguistics
>> and those who presently practice it. Guesswork by Researcher A is not
>> necessarily of the quality of guesswork by Researcher B. The only
>> solution is to have a consensus on what the proper methodology is for
>> calculating odds that will show that Researcher A's brilliant guesses are
>> statistically probable, and expose B's as professorial humbug.

>Pat, you've said this a couple of times now (on a couple of different lists),
>but I must correct you.  Historical linguists do not rely on statistics to
>prove or disprove the validity of a given theory of relatedness.  They rely on
>PREDICTABILITY.  That's what regular sound correspondences are all about.  For
>example, Jakob Grimm described the relationship between the consonants of
>German and Proto-Indo-European.  I can now take his theory and see if it
>works.  I take the word 'father' and Grimm's laws (I'll include Verner's here
>too) tell me that the German word with also start with [f], will have a [d] in
>the middle, and end with an [r].  And sure enough it does.  I take the word
>'father' and run the rules in the other direction and I can predict that the
>Latin word will start with [p], have a [t] in the middle, and still end with
>[r].  Right again.  If I can do this with form after form after form, then the
>sound correspondences are reliable and the genetic relationship postulated is
>confirmed.

What is somewhat frustrating is that you do not seem to realize that this is
a statistical relationship; if what you are saying was absolutely true, we
would be entitled to say that the probabilities were 100 out of 100. But, I
think you know that these laws, while very reliable, do not *always* yield
the anticipated forms.

>If, on the other hand, I postulate a linguistic relationship with a few
>sound correspondences, but those sound correspondences offer no predictive
>power beyond the few dozen forms I cite as evidence, then that linguistic
>relationship cannot be considered proven.  It will always be considered only a
>hypothesis.

I agree wholeheartedly.

But the crucial difference between your conception of this question and mine
is that I believe that the sound correspondences *might*, if tested, have
predictive power if linguists would dismount from their a priori horses and
give them a try.

>A good example of this is Whorf and Trager's Aztec-Tanoan family.  Beyond
>their few dozen examples, no one has ever been able to use their sound
>correspondences to find any more forms in either Uto-Aztecan or Tanoan that
>fit the rules.  It's a dead end.  Therefore, the relationship is considered to
>be suggestive, but no more.  Not proven by any stretch of the imagination.

No reasonable person, and I hope I am one, could disagree with that.

>It's not statistics, it's correspondences and predictive power.

Predictive power, IMHO, is based on statistics.

Pat