Doing historical linguistics (part 1)

H.M.Hubey hubeyh at montclair.edu
Thu Nov 12 12:47:59 UTC 1998


----------------------------Original message----------------------------
Johanna Nichols wrote:
>
>
> >em (to suck), am (cunt in Turkish), amma (mother), amcik (pussy),
> >emesal (female speech in Sumerian), emcek (breasts, udder), meme
> >(breast),
> >emzirik, etc etc.
> >
> >Do you understand the probabilistic implication of such patterns?
> >
>
> I'll take the liberty of commenting, as I understand the probabilistic
> implication of such patterns.  What are the chances of finding, in any
> language, a word consisting of a vowel (any vowel?  or just a non-round
> vowel?) followed by [m], with or without any following material of any
> shape, having one or the other of the set of 7 glosses cited above plus
> presumably any other meaning that might have to do with females?
 
No. I was thinking of something more.
 
If words (phonological shapes and their meanings) did not cluster there
would be no such thing as etymology. If some word X or its reflexes
shows up in a family in random scatter but is strongly represented
in another family, what does that imply?
 
Secondly, if words of two different languages were generated
independently of each other, the if there are chance occurences
it is not only the number of them that matters but also the
patterns. But counting up the numbers shows no indication of
patterns. Word generation is a Markov process so the tests should
be run on some model that purports to be a model of language
evolution/development.
 
 
 
> That boils down to:  what are the chances of finding a sequence Vm(-) in
> any of a wide (or open-ended) set of meanings?
>
> Success is virtually guaranteed, since you get to keep looking until you
> find something that fits.
 
Sure, it is too easy.
 
 
>
> I've computed the chance of finding resemblances in which two similar
> consonants occur in the same order in words with similar senses.  If you're
> allowed:
>
>         (1) a couple of distinctive features' leeway in defining generic
> consonants (so e.g. p, b, and f are all taken to match)
>
>         (2)  and much phonotactic leeway (so e.g. epte, fad, upatha,
> bdezolg, pet, puot, etc., etc. are all taken to match because they all have
> a p, f, or b followed later in the word by a t, d, or th, and no
> intervening consonant)
>
>         (3)  and up to five senses' leeway (e.g. 'black', 'night', 'dark',
> 'soot', 'shadow' or any other set of five meanings you consider related; or
> e.g. 'fingernail', 'finger', 'hand', 'arm', 'claw')
>
> then the event probability of such a match is 0.04, and 25 such matches out
> of a pre-specified 100-word list are required to reach the 95% confidence
> level on a binary test of two languages.  (That's the conventional minimum
> level for deciding that the number and degree of resemblances are not
> random.  I got the number of 25 from a binomial probability table for an
> event probability of 0.04 and 100 trials.)  For one-consonant sets like
> Mark's, over half of the 100-word list would have to consist of matches in
> order to reach 95% confidence.  All this is if you prespecify the 100
> glosses, prespecify the range of 5 for each, and prespecify the generic
> consonants.  And choose in advance the two languages you want to compare.
> If you get to look through all words (i.e. entire dictionaries) of any
> languages, then the required numbers of matches go up.
 
Some of these computations were also done on Language. And that list is
mainly for quantitative approaches.
 
Yes, but this only takes into account number and not pattern. There are
lots of ways of testing things.
 
 
> So the probabilistic implication of such patterns is nil, unless you have
> over 50 of them out of some standard 100-word list.  Such are the hazards
> of doing open-ended searches.  (This brings us to Yeniseian and Na-Dene, of
> which more anon.)
 
I was talking about the implication of patterns of words in a given
language.
 
One of the biggest problems that I encounter all the time is that
neutral
evidence works for the advantage of the dominant theory.
 
For example, most of the IE words could be due to the substratum which
could have been a family. One can always insist that the reason why IE
words resemble each other is because they are all left over from a
previous language which was spread out over the same region.
 
 
> The computation of probability in limited searches is described (briefly)
> in my paper 'The comparative method as heuristic' in M. Durie and M. Ross,
> eds., The Comparative Method Reviewed (Oxford UP, 1996).  I'm working on a
> fuller explanation.
 
Yes, I read it about a year ago.
 
I think you had an article on comparison of Hittite with others.
 
 
--
Best Regards,
Mark
-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
hubeyh at montclair.edu =-=-=-= http://www.csam.montclair.edu/~hubey
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The information transmitted is intended only for the person or entity
to which it is addressed and may contain confidential and/or privileged
material.  Any review, retransmission, dissemination or other use of,
or taking of any action in reliance upon, this information by persons
or entities other than the intended recipient is prohibited. If you
received this in error, please contact the sender and delete the
material  from any computer.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



More information about the Histling mailing list