[Q] Universals, Statistics

Fri Sep 26 20:38:48 UTC 2003

Some of the issues Eddy Ruys raises are addressed in the following paper of
mine

Dryer, Matthew S. 1998 "Why Statistical Universals are Better Than Absolute
Universals".  Chicago Linguistic Society 33: The Panels, pp. 123-145.

a version of which is downloadable at

http://linguistics.buffalo.edu/people/faculty/dryer/dryer/cls97.pdf

However, let me also point out a way in which the one of the problems he
raises is more real than many typologists recognize.  Suppose we have a
large typological database containing data on a large number of typological
features, like the database in the forthcoming World Atlas of Language
Structures.  And suppose someone writes a program that investigates the set
of all logically possible generalizations relating any pair of these
typological features; suppose there are 10,000 of them.  And suppose they
apply an appropriate statistical test and find out that 500 of these 10,000
possible generalizations prove to be statistically significant at the .05
level.

Hopefully, the problem is obvious: it is natural for 5% of generalizations
to come out as statistically significant at the .05 level, not because they
are valid generalizations, but simply due to chance.

I don't know of an easy solution to this problem.  One is to restrict
attention to testing generalizations that are predicted by some theory,
rather than to test large sets of randomly generated hypotheses.  But even
here, some pseudo-generalizations will slip through the cracks.  Another is
to insist on a stricter level of statistical significance.  But if we
insist on the independence required by most statistical tests, then it is
very hard to achieve such a level.  And a smaller number of
pseudo-generalizations will still slip through the cracks.  In other
sciences, one can run the experiment again; but in typology, we can't find
another planet with humans speaking a new set of languages.

I would welcome any other suggestions about solving this problem.

I predict that pseudo-generalizations of this sort will be published in the
typological literature, and there will be little way to distinguish these
from valid generalizations.  Perhaps some already have been published.

Matthew Dryer