# STATISTICS IN LINGUISTICS

X99Lynx at aol.com X99Lynx at aol.com
Sat Jan 30 02:39:41 UTC 1999

```I wrote:
<<Predictive power is based on an effective understanding of cause and effect
relationships.>>

Patrick C. Ryan replied:
<<Why does it seem to be so hard to understand that the relationship of cause
and effect is statistical?>>

Because it is not true.

The relationship of cause and effect is physical or chemical or cultural, etc.
Statistics can be extremely useful in establishing such relationships.  But
I'm afraid statistics do not equal cause and effect.

<<If you have Cause A, and you predict successfully Effect B, every time, then
the relationship is <1> or 100% PROBABILITY.>>

You are definitely jumping the gun here.  You are already telling me there is
a cause and effect relationship BEFORE YOU'VE PROVEN IT.  You are presuming
cause and effect before have statistically shown it.

The best you can say here is that if A occurs and then B occurs, everytime,
there is some probability that A causes B.

HOWEVER, if your assumptions are flawed, you are not proving cause and effect
with this.  All that this demonstrates is a 100% CORRELATION. But NO cause and
effect relationship has been established.  And this should not be hard to
understand.

The classic classroom example is: EVERYTIME you see people carrying umbrellas,
it ends up raining.  Based on that, you conclude that umbrellas cause rain.
(Everytime equals "100% probability.")

Even a very high correlation does not equal causation.  This is very important
in a field like historical linguistics, where you do not have an independent
variable to manipulate and therefore don't have the hard experimental controls
you get in a lab.  With improper analysis, statistics are not just worthless.
They are damaging.

And of course the other thing that is inaccurate is "100% probability".  Until
there is an end of time, there is no such thing.  Because no matter how many
"n" times A leads to B, there is always "n + 1."  If you want to claim it, the
best you get is 99% in this world.

As far as historical linguistics goes, statistical analysis could be a very
powerful tool.  But all it is is a tool.  And if its limitations are
misunderstood, it can be and has been used to prove all kinds of nonsense.

Regards,
Steve Long

```