Principled Comparative Method - a new tool

Sean Crist kurisuto at unagi.cis.upenn.edu
Wed Sep 1 17:31:19 UTC 1999


On Tue, 31 Aug 1999 X99Lynx at aol.com wrote:

> Which brings up an interesting question.  Why use 'semantics'?

> After all, in the usual presentation of the Comparative Method, the meaning
> of the word is just really a way of squaring up different languages.  First
> of all, to make a guess about whether they may be related.  And secondly
> "meanings" - generally dictionary/glossary meanings - just line up the
> languages in a convenient way so that phonology can be more easily compared.

> We can after all imagine two hypothetical languages where every word in one
> is phonologically cognate with a corresponding word in the other - and even
> have clear historical proof of this total cognation - but at the same time
> find in usage none of these cognates are 'semantically' similiar in any way
> that is apparent.

If this were the case, we wouldn't judge the words to be cognate.
Semantics are _not_ just a convenient heuristic for finding cognations; to
be judged cognate, the words have to 1) be phonologically derivable by
regular sound changes from the proto-language, and 2) have meanings which
can have plausibly developed from some meaning in the proto-language.

For example, suppose we had a word in Language A meaning 'river', and a
word in Language B meaning 'leather'.  Suppose that the phonological form
of the words is such that they _could_ be derived from a single word in
the proto-language via regular sound changes which have already been
established.  Nevertheless, we would almost certainly not judge these
words to be cognate, because it is almost inconceivable that there could
have been any semantic developments which would produce the meanings
'river' and 'leather' from any imaginable original meaning in the
proto-language.

> So might there not be, with the mega-statistical probabilities created by a
> mega-data base, a way to avoid the whole issue of meaning?  The co-occurence
> of sound categories in well-populated distributions should yield high degrees
> of statistical certainties (so long as you got the dates right).

So if I understand you, you're saying that there could be some
probabilistic approach which allows you to conduct language reconstruction
without reference to the semantics of the words being compared?  How do
you know they're cognate, then?

> <<If there's some other way of going from attested forms to reconstructions
> of prehistoric forms without using the Comparative Method, I'd like to know
> about it.>>

> I believe internal reconstruction is one often mentioned.  Typographical
> inference is another.

You're right; I should have mentioned Internal Reconstruction.  What you
mean by "typographical inference"?

> <<The only reason we're able to say anything at all about
> prehistoric languages is that sound changes have a particular property,
> namely, they are exceptionless (with a small amount of hand-waving here).
> The Comparative Method crucially exploits this property of sound changes.>>

> Now, to be fair, what you are speaking about is a working assumption.  (In
> fact, the most productive thing about Grimm's Law may have been the
> 'exceptions'.)

Yes, it's quite true that the Neogrammarian Hypothesis (exceptionlessness
of sound changes) is a working assumption.  It's an assumption which seems
warranted, and further, it is the crucial assumption which allows us to
conduct the Comparative Method at all.  If it's false, all bets are off.
It would leave us with no currently known and reliable methodology for
engaging in language reconstruction at all.

> But it is a little silly to say that the only thing we can say
> about prehistoric languages is that their sound changes were exceptionless.

I agree it would be silly.  I never said any such thing.

> In fact, by definition, we don't know anything directly about the sounds of
> prehistoric languages.   So we don't know, by definition, it the sound
> categories included exceptions or not.  But we have decrypted prehistoric
> languages without any knowledge of what sounds the characters represented.

If they were written, then by definition, they are not prehistoric.
"Prehistoric" means "before writing".  But you're quite right that we can
say a lot about the phonological categories of a language without knowing
the specific phonetic values for those categories.

> <<Automata normally perform a concatenation operation across each arc between
> states.  One can imagine an automaton-like machine where the transitions can
> perform other sorts of operations,....  But if the machine in question is
> strictly concatentative (as automata at least canonically are), I'm puzzled
> as to how you would model historical sound change in such a machine, since
> historical sound change isn't concatenative.>>

> It's a little like looking at the pistons in a car engine and asking which
> one will get you to Chicago.  You are assuming a point-for-point analogy
> between the internal system or structure used by the automation and the
> external structure it is being applied to analyze.  The "linkages "in
> "concatenative" do not have to mirror the elements you are analyzing.  They
> are rather internal relationships yielding values that mathematically
> correspond to but do not have to structurally mirror the values you've
> attached to external events.

> /a/>/a/ may correspond to a single "link" in your concatenation.  /a/ > /b/
> may correspond to six, even though your real-life event may correspond to
> only one.  Those six links represent values you have assigned to /a/ > /b/,
> which the machine achieves any why it must in order to match the operations
> required.  'Invisible' intermediate formulae in a spreadsheet are a good
> example.

Okay- I know enough about automata to know in a general way that the arcs
might not match up in a neat way with the way we'd represent the processes
as high-level, ordered rules.  In a previous job, I had to write automata
to produce conjugations and declensions in modern German and Japanese, and
it was certainly true that not all the arcs corresponded in a neat way to
the units that a linguist would ordinarily want to talk about.

The discussion was about a particular application of probabilistic
automata to measure "distance" (whatever that means) among related lects.
I'm wondering if you or someone else could give me a simple f'rinstance
to illustrate how this methodology works in detail, and what it's supposed
to accomplish.  I haven't seen this methodology before, other than on this
list.  I'm interested, but I don't understand it yet.

> <<Whether or not loans happened in the light of written history, you can
> identify a word as a loan from a related language because of the sound
> changes it has and has not undergone.  For example, while English "cardiac"
> does ultimately go back to the PIE word for "heart", you can readily tell
> that it is a loan from a non-Germanic language, because it has not undergone
> Grimm's Law, which applied exceptionlessly in prehistoric Germanic.>>

> Unless of course you are among the number of linguists (no small number) that
> find Grimm's Law representing archaisms, in which case you must find another
> path for the loan.

Are you talking about the Glottalic Theory (i.e., the relatively recent
view which gives a radically different obstruent inventory for PIE)?  If
so, I explained in detail in a recent post why I think this hypothesis is
wrong.  But if you object to this, we can come up with some other case
which doesn't make reference to Grimm's Law to illustrate the point that
you can identify a word as a loan word based on the sound changes which it
has or has not undergone.

Here's another example, if you like.  In Gothic, we've got the ordinary
Germanic word _waurkjan_ "to work".  There's also a noun _waurstw_,
meaning "work, deed".  We can tell that this second word is a loan from
Slavic, because it has an /s/ rather than a /k/, i.e. the satem consonant
shift, which did not apply in Germanic; hence, this can't be a native
word.

> But, in one very important definitional sense,  every
> word in modern English is a "loan" word.  What, for example, is not a loan
> word in Old French, if 'Frankish' is described as a "different language?"

It is just not true that every word in modern English is a loan word.
Some of the words in English are inherited in straight descent from
Proto-Indo-European and are not borrowed from anywhere.

> <<This same method of identifying loans among related languages works just as
> well for languages which don't have a long written tradition.>>

> Just as well, eh?  No added element of uncertainty at all caused by a lack of
> writing?  Have you tried your hand at finding the loans in Thracian?

We weren't talking about languages for which we have no data.  We were
talking about modern languages such as the Polynesian languages for which
you can get as much data as you like (just go out and interview people),
but which don't have many centuries of written tradition as e.g. English
does.

> <<Now, it's true that there is a problematic case: it's hard to detect loans
> which occurred between related languages soon after their branching, before
> very many of the telltale sound changes took place.>>

> There is also the problematic case where loans went back and forth without
> documentation or were loaned from a third language of which we have an
> incomplete record.

As long as the loans in question aren't so early that there hadn't been
any identifying sound changes yet within the individual branches, we
should still be able to identify such words as loan words as I described.

> And another where the chronology of the loan is based on
> eroneous historical information, so that the giver and taker have been
> confused.

The loans were were talking about were prehistoric loans, i.e. loans which
occurred before the languages came to be written.  So I don't see the
connection here with historical information.

> And another where the inherent arbitrariness of sound changes (why
> p>f?) can suggest relationships where commonalities are purely accidental.
> Etc.

It's true that we're not able to say why particular sound changes
happened, but I don't see how this "suggests relationships where
commonalities are purely accidental."

> By the way, do you think there was an intermediate period between p>f where
> there was /p'h/?   Just curious?

There's no evidence to answer that question one way or the other.  We can
say with fair certainty that voiceless stops were not _contrastively_
aspirated in Pre-Proto-Germanic, but there is always the possibility that
they were _phonetically_ aspirated, as is the situation in modern English.
We also know that it's a natural development for a voiceless aspirated
stop to develop into a voiceless fricative, as happened e.g. in Greek.  So
it's possible that the pre-Grimm voiceless stops were phonetically
aspirated, but we can't motivate this view.

> <<You don't need a long written tradition to be able to work out the relative
> chronology of prehistoric sound changes.>>

> We have trouble being sure of the continuity of atomic half-lives, the
> constancy of gravity and the accuracy of radio-carbon dating.  Surely, you
> might take a slightly less certain tone about the chronology of prehistoric
> sound changes.  A certain humility seems to be a characteristic of the better
> scientist.  After all, you never know when an IE Rosetta Stone or a Quantum
> Phyics of Linguistics may show up and demand the humility you can voluntarily
> adopt before hand.

Let me give you an example here.  Suppose that some proto-language has the
syllables /*ki *ke *ka *ko *ku/.  Suppose that one of the daughter
languages first palatalizes *k before *i, giving /*ci *ke *ka *ko *ku/,
and then merges *e into *i, giving the attested forms /ci ki ka ko ku/.

So suppose you're the linguist trying to figure out which of the rules
happened first.  If they had applied in the other order, they would have
given /*ci *ci *ka *ko *ku/.  But this isn't what we find, so we can say
with reasonable certainty that palatalization applied first, and then the
vowel merger.

Of course, there are a lot of cases where the evidence isn't this clear,
and the relative chronology you work out has to be more tentative.  But
there are many, many cases where we have to say that the rules applied in
a certain order and not some other order, because another ordering would
simply give the wrong results.

  \/ __ __    _\_     --Sean Crist  (kurisuto at unagi.cis.upenn.edu)
 ---  |  |    \ /     http://www.ling.upenn.edu/~kurisuto/
  _| ,| ,|   -----
  _| ,| ,|    [_]
   |  |  |    [_]



More information about the Indo-european mailing list