STATISTICS IN LINGUISTICS

Lars Henrik Mathiesen thorinn at diku.dk
Tue Feb 9 17:00:31 UTC 1999


   Date: Fri, 05 Feb 1999 20:42:10 -0500
   From: "Brian M. Scott" <BMScott at stratos.net>

   Patrick C. Ryan wrote:

   > I can see why you prefer not to deal with mathematical models. If
   > you have 100 trials, and the same cause has the same effect, the
   > probability of the cause creating the same effect again is 100%.
   > Not 99%. Not 98%. Infinity is not a factor in this equation.

Pat, there are two types of probability --- observed probability and
`true' probability. You want to use the former to estimate the latter
and predict what will happen. You cannot predict based solely on
observed probability; this is an extremely basic concept.

A correct statement would be that given the observation, the estimated
probability of the same cause creating the same effect is 100%.

That is, if you use the maximum likelihood estimator, which you will
find described within the first few chapters of any beginning
statistics text. The text will also tell you that this estimator is
totally worthless unless you have a good idea of the possible values
of the true probability.

In historical linguistics, we don't.

If your trial is flipping a coin, and you _know_ that the coin was
picked at random from a bag of 11 coins with probabilities of 0%, 10%,
..., 100% of getting heads, and you get 100 heads in 100 trials ---
then it's a very good bet that you got the 100% coin.

If you know nothing in advance, all you can do is reject theories
about the true probability that are inconsistent with your result.
Getting the same result in each of 100 trials is consistent (to a 99%
level of confidence) with any theory that calculates the true
probability of that outcome as 95.5% or more.

So even if you get 100 out of 100 on Monday, you shouldn't be
surprised if you only get 91 out of 100 on Tuesday.

In historical linguistics, however, you don't have 100 clearcut
trials. You have 5, perhaps 10, subjective comparisons --- which means
that the result is consistent with nearly anything at all. What's
more, you get to pick and choose the comparisons you report. That's
like saying "I flipped the coin a lot of times, and wrote down every
time I got a head. See, 100 tallies, that must prove something." All
it proves is that you sat there long enough looking for matches.

   No.  I flip a fair coin (the 'cause') 100 times and get tails (the
   'effect') 100 times -- unlikely, but certainly possible.  The
   probability that I get tails on the 101-st toss is still 1/2, not 1.

On the other hand, Brian, Pat never said the coin was fair.

Lars Mathiesen (U of Copenhagen CS Dep) <thorinn at diku.dk> (Humour NOT marked)



More information about the Indo-european mailing list