Principled Comparative Method - a new tool

Sun Sep 5 09:12:21 UTC 1999

    Date:       Sun, 22 Aug 1999 20:03:41 -0400 (EDT)
    From:       Sean Crist <kurisuto at unagi.cis.upenn.edu>

    On Thu, 12 Aug 1999, Jon Patrick wrote:

    > The idea is that the distance between languages is represented by the
    > series of changes that occur to a large set of words in moving from
    > their parent form to their daughter forms, so that distance apart is not
    > measured between the daughter languages but rather by their distance
    > from their parent. We feel this better represents the real world
    > process.
    > Our data has to be the word set in the parent form (reconstructed words
    > or real words) and then one word set for the each daughter language and
    > the set of phonological transformation rules between each parent and
    > daughter for each word in their chronological sequence. Hence we are
    > modelling the rules and their sequence of application for each word. The
    > extent to which any of this information is hypothetical merely defines
    > the hypotheses one is comparing, but importantly it does not effect the
    > computational method we apply to this data.

    This post certainly caught my interest, because I've got various ideas
    myself about how computers could be better used in language
    reconstruction.  In a very general way, I think we have some of the same
    interests.

    I do have some comments about your specific approach.  If I understand
    correctly, you're measuring language 'distance' at least partially in
    terms of how many historical phonological rules a language has undergone
    since it first diverged from some reconstructed ancestor: the more rules,
    the greater the distance.  (I hope I haven't just plain misunderstood; if
    so, the following may not apply.)

Not exactly, the distance is also a function of the frequency of usage of the
rules, the consistency of their usage in the context of other rules.

    I think the basic problem your approach raises is this: how do you count
    historical phonological changes?  For example, is the Great Vowel Shift in
    English one rule, or a dozen?  It looks like your distance measure will
    depend a great deal on what choices you make on such questions.

It is determined by the dataset you use, not by the algorithm itself.
Remember our method works for the rules about particular words only and that
one asserts exist in a language and the changes they have undergone. In terms
of major movements, they will be identified in the relative chronolgy by their
high frequency in the data.The effect will be seen firstly in the canonical
PFSA but more strongly in the optimised PFSA as nodes with high frequencies of
converging and diverging arcs. It is true that our method will be dependent on
the actual words you use. If the sample of words poorly represents the great
vowel shift then it will not appear strongly in the final result. I'm sure
such a result does not surprise anyone.

    The rule count is going to depend in part on what phonological theory
    you're working in.  A traditional historical grammar of a language often
    lists a multitude of small rules which a modern theory can conflate into a
    shorter list.  Exactly how short you can make the list partly depends on
    what phonological theory you're working in. There may indeed be no
    phonological rules at all in the traditional sense;  phonological change
    could all be just the reranking of constraints, which is what I'm assuming
    in my in-progress dissertation.

You are correct in perceiving that our method is susceptible to interpretation
of what constitutes a rule. We had the problem in the Chinese data of how to
deal with allophones and so we did two different analyses, one considered
allophnes and the other didn't. There is also the problem of seperating rules
that ALWAYS come together, which can therefore be treated as a single rule. I
argue that if you can identify a rule, no matter how small then include it (My
principle of "don't cut out what you don't know the function of" (like
appendices)).

Another perspective on this question is my own view that linguists don't know
their data as well as they think they do. The jump to generalisations is to
quick for my liking. My position was vindicated in the chinese data where we
found far more items than the linguist expected that were exceptional by his
criteria (Another experience that tells me not to accept the rigid Trask
criteria for defining the vocabulary suitable for the study of early basque).
Also treating small rules that are supposed to have 100% correlation does no
violence to the final PFSA if it is true and should not produce any
identification of false structures. The only difference will be that the
absolute size of the PFSA will be larger than otherwise which does no harm as
the number means little by itself but has to be understood in comparision with
an alternative pairing. Should the small rules be present in one
mother-daughter pair and not anther then their presence in the model is vital,
if they are present in both pairs then they contribute nothing to
discrimination between the two pairs.
cheers
Jon
______________________________________________________________
The meaning of your communication is the response you get