The UPenn IE Tree (the stem)

Wed Sep 15 06:03:57 UTC 1999

In a message dated 9/14/1999 3:35:38 PM, mclasutt at brigham.net writes:

<<Actually, you're still clouding the issue of "innovating" versus
"non-innovating" in order, it seems, to label nodes.>>

No I'm not clouding anything.  That IS how the nodes are labeled in this
tree, at least according to the way its been described.

<<Assume a family tree as follows with lots of nodes and intermediate points
clearly labelled.  This should look quite similar to the UPenn IE tree.>>

It would be far less clouding to just use the UPenn tree.  You are causing
the confusion by finding some need to "assume" some other tree than the case
in point.  Specifically you are assuming things that are not in the
description weve been given.  Start with this:

On 9/03/1999 12:39:20 AM, kurisuto at unagi.cis.upenn.edu wrote:

<<Let me say it again: there is no meaningful concept of a "main stem" in
this tree.  You keep on bringing this up, but it is just meaningless.  The
branchings in the tree represent unshared innovations;  no more, no less. >>

You go on, on the other hand, to describe a completely different tree that
has a stem where common innovations are going on in between the branchings.
If that kind of data was included in the UPenn tree, then in fact there would
be a stem, wouldn't there?  And significant events would be going on in that
stem.  But more than once we've been assurred this tree is "stemless".
That's why you have to make up your own tree.

In connection with what's going on in the stem of your assumed tree, you
wrote:
<<That is not at all what the UPenn tree implies, nor is it what any tree
implies."

I don't think you can so casually lump them together without care.

<<There are innovations going on between A and A', but they don't lead to
language diversity because they affect the whole community.>>

And I've said as much.  If such innovations were included in the data, they
are not acknowledged in this Stammbaum.  But more importantly you've lost the
fact that the nodes represent supposedly a real time event derived from
unreconstructed data - an "unshared innovation" that happened at a specific
time reflected by a specific branching.  There may have been a thousand
innovations before the branching, but the only ones represented on this
"tree" are in theory the ones that give "relative" dates to the branchings
daughter languages.

<<There are now more innovations between B and B', but they again don't lead
to diversity because they affect the whole community.>>

But they clearly don't affect the "whole community."  They do not include the
previous branching A'.  So now we have two sets of unshared innovations
offsetting A' - those that happened at the time A' supposedly branched and
those that happened afterward.

So were these two sets of "unshared innovations" part of the "data" that is
the supposed basis of the tree we are talking about?

Please understand my problem with this particular tree.  It's been put
forward as an empirically based, assumption-free computer-based analysis of
the data.

We've been told that no reconstructions were used in the raw data.  So are we
to assume that the program reconstructed not only the pre-attestation shared
innovations that represent the nodes, but also your unrepresented (B,
B')innovations?  Well, it better have reconstructed both, because your (B,
B') innovations are also "unshared innovations" as far as A' is concerned.

We've been told that the only chronological information that was used was the
dates of attestation.  So how did the program determine "innovations" from
before the date of first attestation?  And how did it determine the
"relative" chronology of those innovations?  Did it have to go through the
painful path that Sean Crist took in his ci/ki example?  Does this very smart
program know that ki>ci "rarely" occurs in the world's languages?

We've been told that it is stemless, only reflecting relationships.  So how
did the program know what is an innovation and what is merely is merely a
vestige of the previous state of the language in question?  And how did it
know which branching innovation came first?  (Please don't bring up the
comparative method, unless you know that this amazing program also does the
comparative method.)

When i asked about chronological assumptions, this is the reply I got:
on 8/19/1999 6:41:16 PM, kurisuto at unagi.cis.upenn.edu wrote:
<<The algorithm which the team used produces an unrooted phylogeny, i.e. it
does not compute what point in the phylogeny is the root.  If you picture
this flat phylogeny as a web made of string lying on a table, you could
pick the tree up at any node (including a leaf node) or at any point
between two nodes, and assign that point in the tree as the root.>>

This is what is inherently misleading about the tree in question.  It
supposedly takes the data and only sets up rootless relationships, like the
Cambridge tree described by Larry Trask.  But in fact all the relationships
are based on deep assumptions about what happened when.  Those "unshared
innovations" must happen in a very specific order and along specific lines of
descent for the relationships to hold up.

Which brings me back to why I insisted that some line in the Stammbaum must
be "innovation-free."  It was solely to make the point that a stem (and a
root) was premised in the Stammbaum, whether it was admitted or not.

There are a limited number of innovations indicated on that tree.  They are
apparently the only ones relevant to what the tree is illustrating - the
supposed chronological "relatedness" of the languages.   My point was that
per se there was a sequence of branches that did not "innovate" (within the
narrow set of innovations included in the Stammbaum) all the way done to the
last nodes.  And that this would be the logical "stem."  Branch-offs being
just that, innovations away from the stem.

The latest answer to this point is I believe that both lines coming out of
the node can be considered innovating.  That's convenient, but
chronologically absurd.  Unless both happened on the same day, the diagramm
should show a branch off a branch, illustrating a significant
innovation/divergence in the "proto-language" - the stem.

And more importantly this allows us to ask how the algorithm figured out,
without reconstruction, when and what this unnoted innovation was.  We can't
ask that question when that fact is buried in a "stemless" Stammbaum.

I hope I've brought up enough reasons above for you to reconsider your
conclusion that the "tree" in question is just like any other IE tree.

Regards,
Steve Long