Answers regarding IE tree

Sean Crist kurisuto at unagi.cis.upenn.edu
Thu Oct 7 21:54:03 UTC 1999


On Mon, 4 Oct 1999, Richard M. Alderson III wrote:

> I've now read the technical report in which the early results were published.
> As it appears from Mr. Crist's comments on the UPenn tree, further work has
> continued to refine the original results.

> Is the list of characters published anywhere electronically accessible?
> I'd be curious to see what the definitions for the Hittite vs. the world
> codings were --it may very well be that a different view of laryngeals,
> for example, would change the outcome greatly.

Just for you, I asked Don Ringe these questions when I met with him today.
The set of characters is not yet available online.  It's one of the things
they plan to do, tho.  The first order of business for the team is to get
their monograph on this subject out the door, and they're trying to get it
done before Don leaves for sabbatical this coming spring.

As a side point (and this is me talking now, not Don), I don't think that
any of the characters relates directly to the presence of laryngeals.  The
standard view, which almost everybody accepts, is that there were three
laryngeals in PIE.  It wouldn't really help us to set up a character which
codes for the retention of laryngeals, since the _loss_ of laryngeals
could have been an independent innovation in every branch, and has to be
coded as such; hence, such a character would be of no probative value.

> I have a larger problem with the tree as a whole, now that I know more
> of the details:  Only one language from each sub-family was used to
> provide input, and I believe that *this* choice may very well have
> biased some results.  I would be much happier if the Italic and Celtic
> languages were not from the respective "Q" branches thereof.  Does any
> of the papers provide information on how long a run of the program to
> interpret the characters actually runs (rather than the theoretical
> O(<mumble>) specification)?  How much time would be added by data from
> other languages?

Using a very fast, expensive, state-of-the-art machine donated by Intel, a
single run of the algorithm over the current character set takes about
eight days. How long the run takes depends on how messy the data are
(i.e., how badly they deviate from a perfect phylogeny). If you take
Germanic out, the remaining tree is close enough to a perfect phylogeny
that the algorithm only takes about three days to run.

As I mentioned before, the algorithm is guaranteed to give you a perfect
phylogeny if there is one; but if there is no perfect phylogeny, the
algorithm will not provably give you the phylogeny with the best fit.
However, if the number of characters not conforming to the resulting tree
is small, you can do an exhaustive search on the relatively small
remaining space of possible optimal trees, and then you can be certain you
have the optimal tree. If the number of non-conforming characters is
large, however (as is the case if you include Germanic), searching this
space becomes intractable, because the search runs in exponential time.

Regarding your concern that the results might have been biased by
selecting only one language per major branch: this necessity is partly
forced upon us by the slowness of the algorithm (and mind you, the
algorithm in question is at the very cutting edge of the field in computer
science, so the required processor time not something which can be readily
improved upon at our current state of knowledge).

A further consideration is that you have to have a pretty substantial
corpus to be able to code for a language in any useful way. I remember Don
Ringe saying in a talk that the team chose Old English over Gothic to
represent Germanic, for example, for this very reason.  He added that the
character encoding for Germanic would have been no different (other than
less complete) if they had used Gothic instead.

  \/ __ __    _\_     --Sean Crist  (kurisuto at unagi.cis.upenn.edu)
 ---  |  |    \ /     http://www.ling.upenn.edu/~kurisuto/
  _| ,| ,|   -----
  _| ,| ,|    [_]
   |  |  |    [_]



More information about the Indo-european mailing list