R: [Q] Universals, Statistics

Paolo Ramat paoram at UNIPV.IT
Thu Sep 25 17:27:54 UTC 2003

----- Original Message -----
From: Eddy Ruys <eddy.ruys at let.uu.nl>
Sent: Thursday, September 25, 2003 12:26 PM
Subject: [Q] Universals, Statistics

> Dear all,
> I have just read the Pro and Con section in Ling. Typ. 7-1 on
> statistical confirmation of implicational universals. Coming not
> from Language Typology but from the Generative tradition, this
> got me wondering about a more basic (no doubt, in your
> eyes, trivial) question in relation to the statistical
> treatment of language types. Since my understanding of statistics
> probably ranks even below my understanding of typology, this question
> may well give rise to some good laughs about those naive
> generativists -- so much the better. In any case, please be assured
> that my question comes from ignorance and a genuine interest,
> not from any desire to annoy or offend.
> Bluntly: I would guess that all typological generalizations
> obtained by extracting (statistical) patterns, implicational
> or otherwise, from a given database of languages are meaningless.
> The reason I seem to feel this way is that, if those patterns
> are obtained in this manner, and then stated as generalizations over
> language types, one proceeds in a post-hoc manner. The correct
> procedure, I would guess, would be to first hypothesize that
> a certain pattern must exist, and then attempt to disconfirm this
> hypothesis on the basis of a data set. After reading those articles
> (but perhaps this is where my mistake lies) I was left with the
> feeling that this is not the way people proceed.

The question is very simple and it goes back to the Greek philosophical
tradition: there are two ways for doing research, the inductive 'way up' and
the deductive 'way down'. The first one, which is usually the way
typologists choose, starts from the observation of the REAL facts and makes,
if possible, (implicational) generalizations. Real languages represent the
starting point. The second one is the way generativists prefer: to "first
hypothesize that a certain pattern must exist" is an a priori choice which
(usually in a correct way) derives from the theoretical paradigm one has
The problem is that generativists and typologists hardly come to speech each
other and don't understand that the two ways are complementary and both
necessary. Collecting huge amounts of data without being capable to see that
they are non-random but on the contrary have a rationale in themselves is
useless. The same holds for a theory which is constructed a priori and
disregards the facts: in your example the fact that plates starting with
three letters may exist is by no means uninteresting : what matters is first
to take account of their existence and than to look for the reasons why they
may exist.
Paolo Ramat

> It is as though (borrowing an example from Richard Feynman)
> I were to observe the license-plate number ANZ 192 on my way
> to work, then calculate the unlikelyhood of observing
> exactly this plate, and conclude there is some significance
> to the observation, requiring an explanation.
>    Even if this event itself were relatively unlikely (say, it's rare
> for a plate to start with three alphabetical characters), given the
> number of events I observe every day, some unlikely ones have a good
> chance of occurring. If I didn't decide beforehand that I was
> looking for rare license-plates (not rare hairdos), the observation
> is not interesting.
>    Likewise, given the number of possible patterns in a data set, some
> statistically unlikely patterns will occur, even if the data set
> were completely random and there existed no underlying laws or
> tendencies governing human language variation.
> I'd like to know whether this is nonsense, or in fact Typology
> 101 and standard practice.
> Thank you for any comments.
> Eddy
> --
> ======================================================================
> E.G. Ruys
> U.i.L-OTS, Utrecht University
> eddy.ruys at let.uu.nl     http://www.let.uu.nl/~Eddy.Ruys/personal/
> 030-2538439

More information about the Lingtyp mailing list