Assumptions in Computing phylogenies

Sat Mar 4 07:23:34 UTC 2000

>>Actually, properly done, cladistic analysis *determines* which
>>characters are innovations and which are retentions.

In a message dated 3/1/2000 7:27:04 PM, Hans_Holm at h2.maus.de replied:
>.. In my humble understanding it is vice versa.
>The biologist or linguist decides which features are retentions vs.
>innovations, and the cladistic algorithm computes the 'optimal' tree. See
>my parallel mail for a textbook on the topic.
>But perhaps it is a misunderstanding.

Well, as far as biology is concerned, cladistics is about finding out what
attributes count and which don't.

If you've noticed that some of our friends here on the list in discussing
time, have put a lot of emphasis on cognates and other genetic features.
This I think reflects one logical approach to the problem.  Proceeding on the
idea that if a mass of the most genetically related portions of two language
are very close, then the languages themselves must be close in time.

Cladistics sort of switches this kind of thinking around.  Traditional
classifications in biology were in fact based on "similarities" between
species and the closer the similarities were, the closer the expected
relationship.  Cladistics on the other hand disregards large blocks of
similarities on a rather wholesale basis.

The example that should alert one to the turnaround in thinking that
cladistics represents is given in Henry Gee's In Search of Deep Time - and in
the review of the book by Kevin Padian in the Feb '00 Scientific American
that Lloyd mentioned on the list awhile ago.

And that's the example of the Salmon, the Lungfish and the Cow.

The lungfish and the salmon (like Sanskrit and Latin) have many, many things
in common.  From specifics of average shape to scales and boniness (as
opposed to the shark's cartilage)  and both of course live in water and are
traditionally grouped as fishes.  And we can with some certainty identify the
common ancestor of the salmon and the lungfish - the paleocinids - and
identify a large number of retentions and shared innovations between salmon
and lungfish.

On the basis of those similarities and innovations, the traditional
classification of the salmon and the lungfish had them relatively closely
related on the vertebrate family tree.

The surprise is that cladistics groups the lungfish not with the salmon but
with the cow.  (Lest this appear obvious in any way, the specific innovations
indicated that the salmon lacks and the lungfish and cow share are "nasal
passages that open in the throat and jointed bones in the fins/limbs".)

Cladistic trees are actually a chain of innovations.  And this is in fact the
cladistic definition of "ancestry".  There is no equivalent of reconstructing
a hypothetical parent on the basis of phonetic equivalencies in cladistics.
In fact, there is generally speaking no reconstructing of parents in orthdox
cladistics at all.  How the dinosaur became a bird for example is NOT
reconstructed and nothing in between the dinosaur and the bird is
reconstructed.  Particular innovations are the only trail and backward
reconstruction to identify forward relationships is not part of the scheme.
(Though Padian in his review gives some reasonable space for dissenting a
small degree from what he calls this 'hard-nosed' position about any
reconstruction.)

The problem with applying cladistics to historical linguistics as reflected
in IE scholarship (the one that the UPenn tree ran into) is that the methods
are so thoroughly contrasted in the nature of their data.

One can see this in a fundamental definition of cladistics which defines "the
unrooted tree" as one "for which the ancestor (= root) has not been
hypothesized...."
(A good glossary of cladistic terms by Micheal Crisp at ANU is available on
the web at http://www.science.uts.edu.au/sasb/glossary.html.)

If you look for commercial software that construct cladistic phylogenies on
the web, you will see that they all are designed to construct "unrooted
trees" but not necessarily rooted trees.  And the reason for that is that
cladistic analysis starts with unrooted trees.  The whole idea is to strip
the data of any assumptions regarding ancestry.  The exact opposite of using
PIE cognates as data, which would root the tree before any analysis is done.

One can see how this could create a problem in using traditional IE data in
cladistic analysis.

Reflexes based on PIE reconstructions and reconstructed PIE morphologies of
course are based on a hypothesized parent.  They cannot be used to build a
unrooted tree any more than DNA sequences reconstructed from a hypothetical
parent can (a celebrated error in early cladistics.)

And without an unrooted tree, you lose the core of cladistic analysis - since
the rooting process is all about finding the best fitting actual ancestors
and thereby the direction of evolutionary change among the branches.

Thinking about it again, it would seem that morphology - so long as it is not
based on reconstructed *PIE morphology - might be appropriate data for a
cladistic analysis.  But the taxonomy would need to be thorough.  And the
results might be surprising.

Remember the significant amount of biological "morphology" (the body parts
and how they are put together) that the lungfish and the salmon share in
common and how little either look at all like the morphology of the cow.  But
it is a very few, very particular and not very obvious pieces of morphology
that groups the lungfish and the cow together.  It's not the weight of
evidence of ancestry in cladistics, it can be very one small piece of
evidence.  And that could produce the same very unexpected results - like
perhaps grouping French with Armenian or Slavic with Basque.  But, after all,
surprises keep life interesting.

Hope this helps.

Regards,
Steve Long