phonetic resemblances, etc.

Sun Jan 31 02:29:05 UTC 1999

----------------------------Original message----------------------------
An important statistical point.  If this list

>
>>          Mamulique   Garza      Comecrudo
>>
>>sun       atl        ai         al
>>moon      kan        an         eskan
>>water     aha(?)     axe        apanekla
>>road      --         aie        aaul
>>man       (kessem)   knarxe     na
>>woman     kem        kem        kem
>>sky       --         apiero     apel
>>

were the entirety of our data on those three languages, we would be
justified in considering relatedness to be probable.  For each of the seven
glosses, at least two and often all three of the languages have resemblant
forms; for each of the languages, each word resembles one or both of those
of the other languages.

Even if we reduce the phonetic resemblance to the binary distinction
between initial "k" vs. initial "a", or even initial C vs. initial V, I
suspect that the number of matches would exceed what is expected by chance.

Problem is (if I recall the article correctly), this isn't the entirety of
our data. It's the seven most nearly resemblant glosses that could be
picked out of a larger list.  That is, it's all and only the positive
evidence.

It is that statistical consideration, and not the closeness of the phonetic
resemblances or the regularity of correspondences, that makes this data set
poor evidence for relatedness.

Here's a simpler analog.  What are the chances that if you toss a coin six
times you will get heads all six times?  Very small.  What are the chances
that, if you toss the coin a couple hundred times, somewhere in the record
of those tosses there will be six successive heads?  Excellent, because
this is a search for positive evidence with no attention to the number of
failures.

(Do I recall correctly that only these three languages were compared in
Goddard's original article?  In that case the evidence, though weak because
selected from a longer list of forms, is still stronger than the evidence
usually offered in multilateral comparison, where the searcher in addition
gets to choose from a larger set of languages.  If we were to add Navajo
and Ket data to the larger wordlist, we could easily find seven sets in
which at least one of Mamulique, Garza, Comecrudo, Ket, and Navajo
resembled at least one of the others even more closely than most of the
resemblances in the three-language set above.  None of the people involved
in this discussion is advocating that approach; I'm just pointing out that,
in principle, a closed set of three languages is a firmer basis for
comparison than a larger set offering more options in comparison.)

>>From wordlists as small and unreliable as the three above we know so little
about the languages that there's little point in debating whether the
resemblances are phonetically close, whether correspondences are regular,
etc.  Words consist of more than sounds.  For all we know, these could be
gender-prefixing languages in which inanimates have /a-/ and animates have
/k-/.  In that case the phonetics is immaterial; it's the morphemes that
yield the resemblance.  We have no idea what these forms represent, but if
this were the entirety of our data we could suspect relatedness without
positing an analysis (morphological or phonological).  That would be an
example of relatedness justifiably hypothesized from lexical material on
the basis of something other than phonetic resemblance.

Johanna Nichols
Professor
Department of Slavic Languages
Mailcode 2979
University of California, Berkeley
Berkeley, CA 94720, USA

Phone:  (1) (510) 642-1097 (direct)
        (1) (510) 642-2979 (messages)
Fax:    (1) (510) 642-6220 (departmental)