STATISTICS IN LINGUISTICS
X99Lynx at aol.com
X99Lynx at aol.com
Thu Feb 4 05:43:06 UTC 1999
In a message dated 2/3/99 10:14:54 PM, you wrote:
<<Above, I said that the coincidence of A and B was 100%. That 100% defines
This is no true. That 100% is a correlation and nothing more. Coincidence
does not equal causality.
<<If A then B, every time, and there is no reason to think that will
ever change, then, whether we ever correctly understand the modality of the
causation, there is a causal relationship between A and B.>>
This is not true. A and B may be the independent effects of a common cause
with NO causal relation between them. You have no way of knowing that based on
this information alone. ( Every time neutrons appear, mu mesons follow.
Neutons do not however cause mu mesons.) "No reason to think this will ever
change" is not a valid way to make an inference in statistics. It is in not
one of the books on the subject, except as an unacceptible assumption.
>The classic classroom example is: EVERYTIME you see people carrying
>it ends up raining. Based on that, you conclude that umbrellas cause rain.
<<Sophomoric! The correct causal relationship is:
1. Whenever there is a perceived prospect of rain (A), people carry
You've jumped the gun again. ALL YOU KNOW in this example is that the
appearance of umbrellas are followed by rain. "Perceived prospects" are not
within the given observations. And, a bigger sin, you switched the dependent
variable. The question we were addressing is the "effect": rain.
Also you have missed a critical question that would answer our question using
statistics. The statement is that the appearance of umbrellas are always
followed by rain. You have not asked me if it ever rains when I don't first
see umbrellas. That would be what indicates that there are other variables
controlling rain than umbrellas.
To call this sophomoric may be correct. Actually freshman would be probably
<<Why is the prospect of analyzing linguistic data rigorously, employing
mathematical models, so frightening to you?>>
Given this little exercise, do I really have to answer that?
<<I can see why you prefer not to deal with mathematical models. If you have
100 trials, and the same cause has the same effect, the probability of the
cause creating the same effect again is 100%. Not 99%. Not 98%. Infinity is
not a factor in this equation.>>
All of this is completely not true. That is not how probability is
Remember that if you use probability to predict, you cannot ever be 100%
certain that the next event will match your prior events, no matter how many
incidences you have measured. And the number of incidences will affect your
percentage of certainty especially in a random distribution. All probability
is basically measured against random distribution.
By your logic, if I flip a coin three times and it is always heads, then "the
probability of the cause (flipping) creating the same effect (heads) is 100%."
I don't need to tell you that is not the probability of getting heads next
For a better idea of what those percentages mean, check a bell curve at the
far end and see what percentage is at the very farthest end. That is
probability at its extreme. I wouldn't get into infinity at this point.
<<There are no limitations to statistics.>>
There are some things that can only be submitted to statistical analysis after
they are properly analyzed. And there are many times that simply do not
provide isolatible variables. These are automatic limitations on
I don't think your wrong about the value of statistics, but I do believe that
if you try to apply it to a very specific problem, you will see that it does
have its ups and downs.
With your formidable knowledge of historical linguistics, you might try
analyzing a manageable, short-term statistical problem. Like possible
correlations with the occurences or non-occurence of a particular sound shift
in the written records of a very limited historical period. The excercise
might be sobering. On the other hand, you might also prove me wrong about all
More information about the Indo-european