Principled Comparative Method - a new tool

Mon Sep 6 04:03:59 UTC 1999

 ON  Wed, 1 Sep 1999 13:31:19 -0400 (EDT) Sean Crist said

    Okay- I know enough about automata to know in a general way that the arcs
    might not match up in a neat way with the way we'd represent the processes
    as high-level, ordered rules.  In a previous job, I had to write automata
    to produce conjugations and declensions in modern German and Japanese, and
    it was certainly true that not all the arcs corresponded in a neat way to
    the units that a linguist would ordinarily want to talk about.

    The discussion was about a particular application of probabilistic
    automata to measure "distance" (whatever that means) among related lects.
    I'm wondering if you or someone else could give me a simple f'rinstance
    to illustrate how this methodology works in detail, and what it's supposed
    to accomplish.  I haven't seen this methodology before, other than on this
    list.  I'm interested, but I don't understand it yet.

I'll try. Lets say you build a probabilistic automata from a data input
stream. In our case each input "sentence" is a set of rules describing the
relative chronology of a word pair. As you traverse over the PFSA , building
new nodes where needed you leave behind on each transition arc an increment to
the counter for the number of times you have travelled that arc. You return to
the start node when each word-pair rule set is used up. Once all rule sets
have been counted in the automata you have the canonical PFSA, that is it will
parse all rule sets in your data sample and none others. Note that the
transitions counts represent an estimate of the probabilities of transitioning
out of that state for those rules. Using the principles of Information Theory
we can compute the length of the message you would need to use to optimally
encode a description of that PFSA. We talk of that quantity as a "cost" or a
"message length" - the meanings are synonymous. So initially we can give you
the "cost" of the canonical pFSA. However further investigations can be made.
It may well be that parts of the PFSA are identical in which case in terms of
minimising the cost of describing the PFSA it would be better to merge a few
states and get a shorter message length. The disadvantgae is that now you will
allow the PFSA to represent a greater complexity in the data than the
canonical PFSA. From our point of view, if merges shorten the message
length/cost then the original form of the PFSA was not justified statistically
and deserved to be compacted. Note the data is not discarded just the form of
its repesentation in the PFSA is altered. Once you have the message length for
two competing explanations of the data the difference in cost repsents that
one is a better explanatio of the data than the other, or alternatively, one
has revealed more structure in the data. The value of the diffeence can be
thought of approximately as an odds ratio, 10 bits is 1:2^10(2 to powerof 10)
in favour of the shorter message over the longer message.
May I suggest that your work with Japanese & German used automata to identify
states - the "production" part is not part of automata theory but rather a
separate task that automata were used merely to control, perhaps.
cheers
Jon Patrick
______________________________________________________________
The meaning of your communication is the response you get