The UPenn IE Tree (a test)

Thu Sep 2 16:07:01 UTC 1999

On Wed, 1 Sep 1999 X99Lynx at aol.com wrote:

(Your message is so long that I'm going to have to be selective about
which parts I respond to; sorry.)

> You might check whether all this makes sense with Mr. Tandy.  It should.

Just as a minor correction, her name is Tandy Warnow.

> But do understand that this Tree and its approach seems to have a lot of
> weight behind it (Ringe, UPenn, 'an algorithm developed to produce optimal
> phylogenies of biological species' and page 369 of Larry Trask's textbook -
> although he does not seem to necessarily endorse the approach.)

> This Stammbaum put in such a light might actually mislead people into
> thinking that there is a degree of certainty in this that the authors might
> not endorse.  I think that happened to me at one point.

I'd be the last person to say that we ought to judge an approach on the
basis of the prestige of its developers.  It ought to be judged on its own
merits without regard to who thought it up.

> In any case, whether or not this approach actually adds anything to our
> knowledge is not a given.  There's the question of the kind of data that it
> is based on, for example.  The relative weight put on the specific
> similarities and differences among the languages in calculating relatedness.
> I never did ask if reconstructed forms were included in the data.  The tree
> appears to be rooted chronologically, since PIE caps it, and that raises the
> question of how dates of attestation were handled.  And so forth.

To answer these questions:

-As I mentioned before, the characters were morphological, phonological,
and lexical.  The team would prefer to use strictly morphological
characters, but the problem is that there aren't enough of them, so they
have to flesh out their data set with other types of characters.

-I _think_ that all the characters were weighted equally; I could be wrong
on this point.  I don't remember either reading or hearing anything from
the team about weighting any characters more heavily, altho the
methodology certainly allows you to do so if you choose.

-Reconstructed forms were not included in the data.

-Dates of attestation were not taken into consideration at all when
producing the unrooted phylogeny.  It was produced strictly on the basis
of the characteristics of the languages without regard to dating.  _After_
this tree had been produced, the team did go on to produce a version of
the tree showing the earliest date of attestation for each language
against the branchings they had already worked out; it puts certain
constraints on when the posited branchings could have happened.

> Here are two basic problems:

> You wrote:
> <<Yes, I agree that if the situation you describe were true, then the
> character-based approach could not arrive at the tree that you drew.>>

> I don't believe that's true.  You may have misunderstood.  What I wrote was
> that "the Stammbaum with its given assumptions, would not be able to reflect
> these events accurately."

> In fact, I believe that the hypothetical tree would LOOK EXACTLY THE SAME.
> (remember that the hypothesis is that you do not know that "Celtic1" was the
> actual parent.)  This is simply because you would have no way of knowing that
> what you are calling innovations are actually inherited and vice versa.  The
> attributes of filial "Celtic 6" - the first attested appearance - justifiably
> look like late innovations.  You would be totally justified in looking
> elsewhere for earlier indications of the parent.  The Stammbaum would be
> perhaps your best guess - given your ignorance.
>
> (Please read this with some care.)  This is not a specific fault in the
> Stammbaum or the approach.  This is the necesssary degree of uncertainty we
> have about these past relationships.  The time of first attestation is all we
> have to go by.  So, in the extreme case of a parent that is understandably
> mistaken for a filial, we would have all the paths of descent wrong.

I think I might have been misunderstood here.  Suppose we grant your
premise: that a language remained unchanged over many centuries, while
various other languages branch off from it and innovate (If this is true,
then several of the internal nodes in the tree represent _exactly_ the
same language).  At the end of the process, what linguists are left with
is the terminal nodes of a phylogeny, with no knowledge about the internal
structure of the tree.

What I'm saying is this: if we took character-based data from the various
daughter languages and ran the algorithm we've been discussing over them,
that algorithm would _not_ produce a tree where the same language is found
on multiple internal nodes within the tree (which is what you're
positing).  It would produce a tree where all the temporal stages of your
unchanging parent language are represented as a single node.

As for not being able to tell what's an innovation, this is just wrong in
the case of mergers.  I'll give a case of this below.

> I hope you see this.  As in genetics, the individual expression of a gene
> does not tell you by itself where it belongs in the line of descent.  That's
> because basically the parent gene looks exactly like the filial gene.  And
> "shared innovations" are just more localized genes, shared by fewer
> individuals, and they will tell you nothing about parentage until you have
> some way to assign a place in time to them. If you assign the wrong place in
> time, you will quite simply mistake the parent for the F generation and vice
> versa, through no fault of your own.

> You wrote:
> <<Naturally, it _can't_ have been the case that PIE looked like Celtic,
> because the other branches would have to undergo some impossible
> unmergings.>>

> I'd really, really ask that you give one example of that. I don't think it is
> true.  Of course, you should not use Grimm's law or similar prehistoric event
> as a dating mechanism, for the simple reason that is circular.  If, e.g.,
> Proto-Celtic were assumed to be PIE, it would not need to change the fact of
> Grimm's Law, but it might change its dating (which is in controversy in any
> case.)

I certainly can.  For example, PIE distinguishes three different series of
dorsal obstruents: the palatals (*k', *g', *g'h), the velars (*k, *g,
*gh), and the labiovelars (*kw, *gw, *gwh).  Only one IE language
preserves this three-way distinction intact (namely, Luvian of the
Anatolian group); all the others merge at least two of the series.

Celtic, like most of the European IE languages, merges the palatal series
with the velar series.  Indo-Iranian, on the other hand, merges the velars
and the labiovelars.

Celtic therefore cannot be the proto-language for the whole family.  There
is _no_ _way_ that the Celtic palatal/velar series could come to be
unmerged in exactly the same way in Anatolian, Indo-Iranian, etc.
Unmergers of this sort simply don't happen; mergers are irreversible.

(To be 100% honest, I'd need to qualify that last statement, but the
qualification would not be productive here; for the purposes at hand, the
statement as I gave it is true.)

> There should really be no unmerging problems, only a rearrangement of dates
> and directions of inheritance and a reassessment of what constitutes
> innovations.  YOUR RAW DATA STAYS THE SAME.  It's just the interpretation
> that changes.

That's true when you're producing a phylogeny of biological species.  It
doesn't work when you're talking about languages.

When we're talking about genetic innovations, it doesn't matter what order
the innovations happened in; and there's always the possibility of
back-mutation, etc.  Not so for human language.  As I just described in a
recent post, the innovations often have to have happened in a particular
order, because a different ordering would give the wrong results.
Further, there is no linguistic analog to biological back-mutation,
because once a phonological merger is done, it's done.  You can't undo it.

So in linguistics, you don't have the luxury that you have in biology of
being able to draw different trees for the same species based on different
choices about the temporal ordering.

> Of course, you may argue that Hittite and Greek historically appears before
> Celtic and so must be assumed to be older.  That is fine.  But it is not
> linguistic evidence.  And to the extent that the Stammbaum and the approach
> is making any such assumptions, it is extra-linguistic.  And should be
> understood to be so.

I agree with this.  As it works out, if you impose the tree I gave on the
earliest dates of attestation for the various branches, it has to have
been the case that all of the branching took place prior to the earliest
attestation of an IE language, namely Hittite.  The exceptions are the
branchings of Germanic from Balto-Slavic, and the branching of Baltic and
Slavic, which could have happened after Hittite was attested.

> A completely different issue is this business of the stem.  You've describe
> it many ways, but you still haven't accounted for something.  And that is the
> speakers - perhaps a majority - who are not part of the branch-offs through
> all the branchings.  If the branchings don't happen all at once, then there
> is a core still extant that these branches are coming from.  RIGHT DOWN TO
> THE LAST BRANCH-OFF.  This a is a logical necessity.  (EXCEPT of course for
> the last branch-off!)  There must still be speakers who are NOT Tocharin or
> Italo-Celtic or Greek-Armenian after those languages are represented as
> branching off.

Let me say it again: there is no meaningful concept of a "main stem" in
this tree.  You keep on bringing this up, but it is just meaningless.  The
branchings in the tree represent unshared innovations;  no more, no less.

It's simply an accident of history that the branchings happened in such a
way that the tree is a bit lopsided in its branchings.  Suppose that
history had run differently; suppose that all the branchings occurred as
they did, but that the Anatolian and Tocharian branches had survived and
flourished down to the present, with a great many sub-branches in their
parts of the tree.  Let's say that Greek, Italic, Celtic, Balto-Slavic,
and Germanic had all died out prior to the advent of writing, leaving just
a few speakers of Sanskrit and Armenian up in the mountains somewhere to
represent the whole left side of the tree.

This would in _no_ _way_ change the the meaning of the first two
branchings at the top of the tree (PIE branching to Anatolian and an
unlabelled node, and the unlabelled node branching to Tocharian and
another unlabelled node).  In this hypothetical world, someone like you
might argue that some line of descent thru the heavily ramified Tocharian
or Anatolian branch is the "main stem".  But this is meaningless in terms
of the criteria on which the tree was worked out.

> And that means those speakers should be speaking a language between the
> branch-offs that had an identity of its own.

It might or might not have a _culturally recognized_ identity of its own.
For example, if the speakers of Pre-Proto-Italo-Celtic were engaged in
trade with the speakers of Proto-Greco-Armenian-Indo-Iranian-Balto-Slavic-
Germanic, they was very likely a period where they considered their
language to be the same while being aware that there are differences, as
is the case for speakers of American and British English today.

On the other hand, it could be like modern Hindi and Urdu, which are the
same language, altho the speakers hotly deny it for political reasons.  We
don't know how these prehistoric groups felt about each other.  In terms
of the innovations within the branches, it doesn't matter.

> It is not equatable to earlier
> branch-offs and it cannot be equated with later branch-offs.  That core has
> to have a real existence, separate from the branches.  Otherwise your
> branches are all chronological daughters of each other, one after another.

> That is why representing that core in the Stammbaum is also a logical
> necessity.
> It represents speakers who are speaking a distinct LANGUAGE that isn't any of
> the branches until at least the final branch.

Yes, each of the internal nodes in the tree is intended to represent a
particular language which was spoken in some time and place.  For example,
when the tree shows a branching between the Greco-Armenian branch and the
Germanic-Balto-Slavic-Indo-Iranian branch, the claim we're making is that
there were two such languages being spoken somewhere.  The nodes are not
some abstraction; they represent actual posited prehistoric languages,
albeit unlabelled.

The point where I object is in calling any particular line of descent the
"stem".  No line of descent has any special status in the tree.

  \/ __ __    _\_     --Sean Crist  (kurisuto at unagi.cis.upenn.edu)
 ---  |  |    \ /     http://www.ling.upenn.edu/~kurisuto/
  _| ,| ,|   -----
  _| ,| ,|    [_]
   |  |  |    [_]