6.409 Comparative method

The Linguist List linguist at tam2000.tamu.edu
Wed Mar 22 06:25:59 UTC 1995


----------------------------------------------------------------------
LINGUIST List:  Vol-6-409. Wed 22 Mar 1995. ISSN: 1068-4875. Lines: 198
 
Subject: 6.409 Comparative method
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Asst. Editors: Ron Reck <rreck at emunix.emich.edu>
               Ann Dizdar <dizdar at tam2000.tamu.edu>
               Ljuba Veselinova <lveselin at emunix.emich.edu>
               Annemarie Valdez <avaldez at emunix.emich.edu>
 
-------------------------Directory-------------------------------------
 
1)
Date: Thu, 16 Mar 1995 10:37:06 -0500
From: Alexis Manaster Ramer (amr at CS.Wayne.EDU)
Subject: Re:  6.374 Focus Systems, Comparative Syntax
 
2)
Date: Sat, 18 Mar 1995 14:57:40 +1100 (EST)
From: j.guy at trl.OZ.AU (Jacques Guy)
Subject: Comparative anything (syntax, lexicon, amino-acids...)
 
-------------------------Messages--------------------------------------
1)
Date: Thu, 16 Mar 1995 10:37:06 -0500
From: Alexis Manaster Ramer (amr at CS.Wayne.EDU)
Subject: Re:  6.374 Focus Systems, Comparative Syntax
 
Lloyd Anderson's latest posting makes it seem as though there were
some conceptual or terminological difficulty surrounding the
question of binary vs. n-ary comparison (where n is greater than 2).
However, the mathematics involved is well-understood, and we
can easily calculate, for any given set of circusmat
nces, whether one or the other method is more likely to yield
false positives or false negatives.  And it is a simple fact
that under some highly artificial conditions, binary comparison
can be better than n-ary (for smallish values of n), although
as n grows, it will always end up being better, where by better
I mean less likely to yield false positives (that is, spurious
claims of relatedness) or false negatives (that is, failures to
detect genuine relationships).  As usual, if we get away from
political statements about "comparative method" vs. "long-range
comparison" and stick to the specific linguistic and mathematical
issues, the answers are unambiguous and not all that hard to find.
 
Alexis MR
 
--------------------------------------------------------------------------
2)
Date: Sat, 18 Mar 1995 14:57:40 +1100 (EST)
From: j.guy at trl.OZ.AU (Jacques Guy)
Subject: Comparative anything (syntax, lexicon, amino-acids...)
 
 
... or: the reduced mutation algorithm and other things.
 
Lloyd Anderson asks in which respects the "reduced mutation algorithm"
fell down. In two respects.
 
1. By the proof of the pudding. Hartigan had given, along with the
   description of this algorithm, a wordlist in I forgot how many
   languages, supplied by Dyen. One could surmise, then, that it had
   some seal of approval for this type of data. I applied it on language
   families computer-generated under the strict condition of a constant
   universal rate of lexical change. Here is my report of the eating
   of the pudding:
 
   The program was fed the wordlists of the simulated language family,
   and a phylogenetic tree ([26]) drawn from the account of the
   successive mergings of lists and of the predicted past individual
   word replacements. The tree thus reconstructed is strikingly similar
   to tree [12b], obtained by traditional lexicostatistical techniques
   using the mean-percentage method and a zero tolerance.
     As implemented, the reduced mutation algorithm was extremely slow,
   requiring about 120 seconds of CPU time on a DEC-KL10, whereas none
   of the other methods described so far had taken more than 0.5 seconds
   to process the percentage table, which had been produced from the
   wordlists in just 0.4 seconds.
             (Experimental glottochronology: basic methods and results.
              Pacific Linguistics, Canberra, 1980. p.19)
 
 
   Performance [based on eight experimental simulations]
 
     The reduced mutation algorithm identified the basic binary split in
   all experiments, but did not succeed, even once, in reconstructing
   the subsequent ternary split of ECHO-SIERRA, either as such, or as
   two successive binary splits.
 
   Discussion
 
     The reasons for the resounding failure of the reduced mutation
   algorithm are somewhat akin to those for the failure of the
   traditional lexicostatical method: the measure of the similarity or
   of the distance between two languages is based on data from just two
   wordlists. The measure of distance used by the reduced mutation
   algorithm is furthermore not reconciliable, at least in my eyes, with
   the linguistic model. Interested readers should refer to Hartigan
   1975:233-246.
 
             (Ditto, p.33)
 
   The book in question is: Hartigan, John A.  Clustering Algorithms.
   Wiley, New York, 1974.
 
2. On methodological grounds. As I had already suspected 15 years ago,
   the metric used does not mirror the quantitative properties of
   language families, and the clustering algorithm itself compounds
   errors instead of factoring them out.
 
And now for the umpteenth time around (*sigh*) ...
 
Lloyd Anderson writes:
 
"On the use of a biological "reduced-mutation algorithm" applied to
linguistic data... we are ... positing a set of historical chains of
development by which a language or languages with GIVEN starting points
CAN develop step by step into descendents leaving the evidence along the
way and the results today which we have as our evidence" (my emphasis)
 
Precisely. GIVEN starting points. CAN develop. This parallels biology.
Biologists are helped by the fossil record, linguists by documentary
evidence, dated or datable. But most of the world's languages lack this
evidence. And beyond some 5000 years in the past, the evidence is, in
all cases and for all practical purposes, zilch.
  The starting points, then, are GIVEN if and only if the ancestor
languages have been preserved. When they have not, we, to paraphrase
Lloyd Anderson, "are positing, WITHOUT the benefit of evidence left
along the way, a set of historical chains of development by which
languages which we have as our evidence today COULD HAVE developed step
by step from HYPOTHETICAL starting points".
 
Quoting further:
 
"This "step by step" is like a minimal series of mutations, with the
added information that it is our business to learn which changes
(mutation steps) are more natural, and OF COURSE MOST of these go
only in one direction". (My emphasis again).
 
No.
 
First, it is not true that most changes go only in one direction. Far
from it. Most changes can take place in any of two opposite directions.
Thus in French /o/ (from /akwa/ "water") we have zero originating from
/k/ but in Cypriot /trika/ (from /tria/ "three") and /krika/ (from
/krea/ "pieces of meat") we have /k/ originating from zero. And note
that here I am using not hypothetical reconstructions but attested
ancestors, Latin and Ancient Greek.
 
Second, what is "natural"? Is it more natural to see a case system
shrink, like Germanic or Greek? Or expand, like Finnish? (another
example of change in opposite directions). Perhaps "natural" applies
here to phonetic changes. What is more natural, then? To develop a
bloated vowel system, like Norman French (see Martinet's description of
his mother's dialect), or to reduce it, like Castilian? (yet another
instance of change in opposite directions). Once upon a time Foley
voiced a theory of phonetic stability whereby labials, being more
front than dentals, were more resistant to weakening or loss, dentals
themselves being more stable than velars for the same reason. Sounds
reasonable, and natural, doesn't it? So much then for the whole Celtic
family! And Japanese, and Bau Fijian... and "naturalness".
 
Lastly, let it be granted that most changes, nay, ALL changes have been
observed to occur in ONLY ONE direction. They can have be observed so
only from "given starting points" (attested ancestors, e.g. Akkadian,
Sanskrit, Latin...) and their "descendents leaving the evidence along
the way and the results today which we have as our evidence". For, if they
had not, from what could the observation have arisen other than
HYPOTHETICAL starting points? (And we would be begging the question
again, as usual). Since the languages so attested are a very, very small
minority of the languages of the world, these hypothetical observations
would not be validly generalizable. It is the old story of the traveller
who drops in a pub in Dublin between two ships, and leaves persuaded
that all Irishwomen have red hair because the barmaid had red hair.
There is nothing new under the sun.
 
Chretien and Kroeber had experimented with a similar metric as that of
Hartigan's reduced mutation algorithm in the 1930's, by the way. And
Dyen has, in 1992, again resorted to factorial analysis, exactly like
Milke uncomprehendingly did in 1970. Indeed, there is nothing new under
the sun.
 
Oh well, as my Latin teacher was fond of telling us:
 
There are three secrets to teaching. They are:
 
1. Repeat.
2. Repeat.
3. Repeat.
 
He was wrong, you know. There are FOUR secrets:
 
4. Continue from step 1.
 
--------------------------------------------------------------------------
LINGUIST List: Vol-6-409.



More information about the LINGUIST mailing list