Principled Comparative Method - a new tool

Sat Aug 21 05:35:46 UTC 1999

On Wed, 18 Aug 1999 X99Lynx at aol.com wrote:

<snip>

>jonpat at staff.cs.usyd.edu.au wrote further:

><<Our data has to be the word set in the parent form
(reconstructed words or real words) and then one word set for the
each daughter language and the set of phonological transformation
rules between each parent and daughter for each word in their
chronological sequence.>>

>I'm wondering if there isn't a possible flaw here in using <<the
>parent form (reconstructed words...)>>.

>Reconstructed words have already made assumptions about the
>relationship between the parent and the daughter languages.  In
>fact they are nothing but a presumed relationship between the
>daughter languages.

Steve is quite correct here.  A reconstruction is just that.  In
the best case it is only the statistically most probable
original relationship between the forms found in the daughter
languages.

><<If we have the cost of the messages for two parent-daughter
>pairs then the shorter cost represents the daughter that is
>closer to the parent. In the case of modern Cantonese and Beijing
>we got 35,243.58 bits and 36790.93 bits respectively, indicating
>Cantonese is closer to the common parent, Middle Chinese, than
>Beijing. >>

>Depending on how much reconstruction of the parent you used,
>could this not be an artifact of the reconstructions?  In *PIE,
>certain aspects are considered the innovations of a particular
>daughter language because they do not appear in the other
>daughter languages, and are therefore factored out of the
>reconstruction.  If you only have two daughter languages - as you
>did above - how do you identify the innovation versus the
>original form in reconstruction?  And if you decide in favor of
>one or the other in reconstruction, it will show up in any
>further use of that reconstruction.

It is not so much a question of innovation versus preservation.
It is a matter of how much innovation there is in each daughter
language.  When you have the parent preserved, this can be
measured.  If there is no innovation the daughter form and the
parent form should be identical (thus answering the question of
which is closer).  But when you have to reconstruct the parent,
all of this information (degree of suspected innovation) will go
into the reconstruction.  If you only have to reconstruct a few
words and there is a high statistical probability of the
reconstruction being correct based on information in the
three languages other than the two forms of the word in the
daughters then it will not have much effect on the measurement.
But if you have to reconstruct many words and the reconstruction
is based only on the forms found in the daughter languages then
you are building the distance between them into the
reconstruction and when you analyze it you will just get these
distances back.

>In effect, you may to some degree be measuring how the
>relationship between the daughters has been perceived in the
>reconstructions that you use, as much as anything else.

Again correct.  So long as you have three independent data points
(the parent and two daughters) you can objectively determine the
distance between any two of them.  But when one of the data
points is determined solely by the position of the other two, you
cannot determine anything but the distance between those two (you
cannot determine a triangle given only the length of one side).
The third point is not a point but a locus (of all points from
which it is possible to reach the other two points).  Where this
point is placed on the locus already reflects the perceived
distance of the reconstructed parent form from its daughters.
Playing it back from the other direction just gives you back what
was put into it.  It is circular.

>I would think that the method you describe would be much more
>functional if it at least triangulated daughter languages.  And
>avoided using prior reconstructions - proving itself on its own,
>so to speak.

If this means what I think it does (determining the distance
between three daughter languages rather than two daughters and
a parent) it might be useful for calibration of the method, but
it still doesn't solve the problem of locating the parent.  It
just moves the problem from two dimensions to three.  However, if
you measure the distance between three daughter languages using
each in succession as the node, it may give you a better idea,
statistically, of where the parent should be located (based on
your assumptions about distance between the daughters and the
reconstructed parent).

On the other hand, if this means measuring the distance from the
parent to three daughter languages instead of just two, all this
will do is increase the statistical probability of a correct
reconstruction if one is needed.  In general, the more daughters
you add, the more confidence it may be possible to get for the
statistical validity of the reconstruction.  Again, if you have
all of the languages preserved, the measurements should be quite
good.  But the method is not about elimination of innovation from
the reconstruction, the method is about measuring the amount of
innovation across the daughters.  The problem is that if the
parent form is not available, it is not possible to determine how
much of what is seen in each of the daughter forms *is*
innovation, which is what you are trying to measure.

So if the measurement involves a minimal amount of reconstruction
(and especially if the reconstruction is based in part on factors
other than just the forms of the words in the daughter languages),
I would expect the measurements to be quite valid.  But if the
measurement is based on a completely reconstructed parent
language, all you are going to get out of it is what was put into
the reconstruction.  Of course, the more daughter languages you
can measure, the more confidence you may get in the statistical
probability of the reconstruction.  Using more daughter languages
will also help to reduce the likelihood of the daughters having
innovated the same way independently (historical linguists really
hate it when this happens because it screws everything up).

And using the measurements obtained from the reconstruction may
provide quantitative ideas about where there are problems and/or
inconsistencies in the reconstruction.  This is because it is a
different way of looking at the reconstruction and looking at
things in different ways will often produce new insights.  But
there is no way to separate the distances between the daughters
from the reconstruction of the parent because that's what the
reconstruction is.

Bob Whiting
whiting at cc.helsinki.fi