the Chinese study
Jon Patrick
jonpat at staff.cs.usyd.edu.au
Fri Sep 10 11:02:08 UTC 1999
[ moderator re-formatted ]
I wrote
<<In *PIE, certain aspects are considered the innovations of a particular
daughter language because they do not appear in the other daughter
languages, and are therefore factored out of the reconstruction. If you
only have two daughter languages - as you did above - how do you identify
the innovation versus the original form in reconstruction?>>
jonpat at staff.cs.usyd.edu.au wrote:
<<If I understand "innovation" correctly it has to represented by a rule of
insertion from a null position. that's not a problem it's just another
rule at a particular point in the Relative Chronolgy. The algortihm will
process it correctly.>>
This would appear to be a moot point from what you said above. But just to
be clear: I'm not sure where your algorithm starts, but my point was
simple.
We find a state B(F1) and C(F1). Languages B and C differ by the use of
say one phoneme alone. Otherwise they are identical and coeval. We assume
and reconstruct a parent A. The lone phoneme difference between B and C
creates an unknown: whether the phoneme in B is from the parent or whether
the phoneme in C is from the parent. If we conclude B is identical to the
parent, then C carries the "innovation." (Forget about dual innovations
for now.)
Based on the above there is no statistical certainty at all in choosing B
over C or vice versa. It is not the "insertion from the null position"
that is the issue I think you will see here, but in fact how that insertion
decision affects the reconstruction. Reconstructions should work backward
in time. So if "insertion" = "innovation", it presumes in fact that the
"inserted" data was not in the parent. But in fact we are in complete
uncertainty about that fact. (But again you are not reconstructing.)
OK, I think I understand your question and the problem statement.
I would frame your question a different way. Firstly you have the issue of
what is the optimal reconstruction for a given child. So you construct
multiple putative parents for a child and use our method to choose the
reconstructed parent that best fits the data. You repeat this process for the
sister language and so arrive at two reconstructed parents, one for each
daughter. Then you determine the distance of the daughters from the other's
parent. Whichever gives you the - accumulatively least cost would be the
preferred parent. I will speculate (but get back to you later on the matter)
that the message length for describing the pair of daughter languages for each
putative parent is directly additive because you are merely describing the
cost of one followed by the cost of the other. (FOOTNOTE: there could be some
coding strategies that might be usable to compress the message lengths, say
for example merging the two PFSAs of the daughters for each of the putative
parents).
Does this answer your question? - not directly. I think the answer is in your
own words. There is no discriminatory information in the data as described
that a coding strategy could exploit to give you a choice of solutions.
Since you also said that your approach can only compare two reconstructions,
this may not be a problem for you. Although you will not be able to reduce
the uncertainty in the example above no matter how many reconstructions you
test. Because two alternative reconstructions will not necessarily make one
of the choices better than the other.
An off the cuff answer is I agree with you. Remember however that our method
relies on a reasonably sized data sample. So if the innovation is rarely used
in the sample it will contribute little to discriminating the models. Single
occurences of rules make little contribution to the discriminating between
competitive parents (no pun intended)
It may seem trivial in terms of the work you are doing. But this
fundamental uncertainty in any reconstructive process can yield very
different results in subsequent analysis using those reconstructions as a
basis.
I don't have any sensitivity to the strength of this comment as my historical
linguistic knowledge is limited, so i shall accept it on your word.
Jon patrick
More information about the Indo-european
mailing list