Doing historical linguistics (part 1)

Thu Nov 12 01:27:35 UTC 1998

----------------------------Original message----------------------------
Mark Hubey writes:

>----------------------------Original message----------------------------

>
>em (to suck), am (cunt in Turkish), amma (mother), amcik (pussy),
>emesal (female speech in Sumerian), emcek (breasts, udder), meme
>(breast),
>emzirik, etc etc.
>
>Do you understand the probabilistic implication of such patterns?
>

I'll take the liberty of commenting, as I understand the probabilistic
implication of such patterns.  What are the chances of finding, in any
language, a word consisting of a vowel (any vowel?  or just a non-round
vowel?) followed by [m], with or without any following material of any
shape, having one or the other of the set of 7 glosses cited above plus
presumably any other meaning that might have to do with females?

That boils down to:  what are the chances of finding a sequence Vm(-) in
any of a wide (or open-ended) set of meanings?

Success is virtually guaranteed, since you get to keep looking until you
find something that fits.

I've computed the chance of finding resemblances in which two similar
consonants occur in the same order in words with similar senses.  If you're
allowed:

        (1) a couple of distinctive features' leeway in defining generic
consonants (so e.g. p, b, and f are all taken to match)

        (2)  and much phonotactic leeway (so e.g. epte, fad, upatha,
bdezolg, pet, puot, etc., etc. are all taken to match because they all have
a p, f, or b followed later in the word by a t, d, or th, and no
intervening consonant)

        (3)  and up to five senses' leeway (e.g. 'black', 'night', 'dark',
'soot', 'shadow' or any other set of five meanings you consider related; or
e.g. 'fingernail', 'finger', 'hand', 'arm', 'claw')

then the event probability of such a match is 0.04, and 25 such matches out
of a pre-specified 100-word list are required to reach the 95% confidence
level on a binary test of two languages.  (That's the conventional minimum
level for deciding that the number and degree of resemblances are not
random.  I got the number of 25 from a binomial probability table for an
event probability of 0.04 and 100 trials.)  For one-consonant sets like
Mark's, over half of the 100-word list would have to consist of matches in
order to reach 95% confidence.  All this is if you prespecify the 100
glosses, prespecify the range of 5 for each, and prespecify the generic
consonants.  And choose in advance the two languages you want to compare.
If you get to look through all words (i.e. entire dictionaries) of any
languages, then the required numbers of matches go up.

So the probabilistic implication of such patterns is nil, unless you have
over 50 of them out of some standard 100-word list.  Such are the hazards
of doing open-ended searches.  (This brings us to Yeniseian and Na-Dene, of
which more anon.)

The computation of probability in limited searches is described (briefly)
in my paper 'The comparative method as heuristic' in M. Durie and M. Ross,
eds., The Comparative Method Reviewed (Oxford UP, 1996).  I'm working on a
fuller explanation.

Johanna Nichols

* * * * * * * * * * * * * * * * * * * * *
Johanna Nichols
Professor
Department of Slavic Languages
Mailcode 2979
University of California, Berkeley
Berkeley, CA 94720, USA

Phone:  (1) (510) 642-1097 (direct)
        (1) (510) 642-2979 (messages)
Fax:    (1) (510) 642-6220 (departmental)
* * * * * * * * * * * * * * * * * * * * *