[Q] Universals, Statistics

Thu Sep 25 10:26:30 UTC 2003

Dear all,

I have just read the Pro and Con section in Ling. Typ. 7-1 on
statistical confirmation of implicational universals. Coming not
from Language Typology but from the Generative tradition, this
got me wondering about a more basic (no doubt, in your
eyes, trivial) question in relation to the statistical
treatment of language types. Since my understanding of statistics
probably ranks even below my understanding of typology, this question
may well give rise to some good laughs about those naive
generativists -- so much the better. In any case, please be assured
that my question comes from ignorance and a genuine interest,
not from any desire to annoy or offend.

Bluntly: I would guess that all typological generalizations
obtained by extracting (statistical) patterns, implicational
or otherwise, from a given database of languages are meaningless.

The reason I seem to feel this way is that, if those patterns
are obtained in this manner, and then stated as generalizations over
language types, one proceeds in a post-hoc manner. The correct
procedure, I would guess, would be to first hypothesize that
a certain pattern must exist, and then attempt to disconfirm this
hypothesis on the basis of a data set. After reading those articles
(but perhaps this is where my mistake lies) I was left with the
feeling that this is not the way people proceed.

It is as though (borrowing an example from Richard Feynman)
I were to observe the license-plate number ANZ 192 on my way
to work, then calculate the unlikelyhood of observing
exactly this plate, and conclude there is some significance
to the observation, requiring an explanation.
   Even if this event itself were relatively unlikely (say, it's rare
for a plate to start with three alphabetical characters), given the
number of events I observe every day, some unlikely ones have a good
chance of occurring. If I didn't decide beforehand that I was
looking for rare license-plates (not rare hairdos), the observation
is not interesting.
   Likewise, given the number of possible patterns in a data set, some
statistically unlikely patterns will occur, even if the data set
were completely random and there existed no underlying laws or
tendencies governing human language variation.

I'd like to know whether this is nonsense, or in fact Typology
101 and standard practice.

Thank you for any comments.
Eddy

--
======================================================================
E.G. Ruys
U.i.L-OTS, Utrecht University
eddy.ruys at let.uu.nl     http://www.let.uu.nl/~Eddy.Ruys/personal/
030-2538439