What is Relatedness?

Mon Jan 24 17:56:40 UTC 2000

On Sun, 16 Jan 2000 X99Lynx at aol.com wrote:

> One of the criticisms that came up often when I asked some linguists to
> comment on the tree was that it mixed apples and oranges.  And cherries.
> Syd Lamb wrote in a private message (repd w/permission) that lexical,
> phonological and morphological items "are largely independent of one
> another in how they are involved in change. ..."

That's not true.  For example, if a language inflects with suffixes and
then undergoes phonological changes eliminating certain segments
word-finally, this can result in the loss of morphological contrasts
(thus, phonological and morphological change are not independent).

Or to take another example, phonological change can help bring about
lexical change; for example, "queen" and "quean" used to be pronounced
differently, but after the ee/ea merger a few centuries ago, "quean" began
to disappear from the language because of an unacceptable homonymy (thus,
phonological and lexical change are not independent).  Homonyms do exist,
but languages seem to disprefer them.

> "...Hence a tree constructed on
> purely phonological grounds, for IE or Uto-Aztecan, or any complex
> family, will come out quite different from one constructed on purely
> lexical grounds, and both will differ from one constructed on purely
> morphological grounds."

The assumption you made was that lexical, morphological, and phonological
change operate independently of one another _within_ a language.  Here,
you're talking about something else: you're talking about interaction
_between_ languages.

> There's no question that morphology can reflect different genetic
> relationships than lexical or phonological items.

I wouldn't use the term "reflect" here, since that indicates that we're
talking about some reality here.  A language has one genetic affiliation
and not another.

It's true that lexical items sometimes muddy the waters in determining
this genetic affiliation.  We can _usually_ detect loan words on the basis
of sound changes which a word has or has not undergone.  However, if a
borrowing is early, it can sometimes chance that the sound changes in the
donor and recipient languages have been such that a particular word would
have the same outcome in both languages, in which case the loan is
undetectable.

This is a potential methodological pitfall which needs to be acknowledged.
Acknowledging it is a very different thing from claiming that a _correct_
account can involve a language having one genetic affiliation for its
lexicon and another for its morphology; that idea is simply incoherent.

> This is precisely
> what happened with Germanic in the UPenn exercise.  It was pointed out
> in another post that "there are a great many morphological and
> constructional elements to be found in the earliest Indoeuropean texts
> that are in opposition to apparent lexical similarities and
> dissimilarities.  One example is the accusative of specification
> discussed by Hahn in 'Naming Constuctions in Indoeuropean languages' ...
> This kind of morphology and syntax points in a very different direction
> than mere homonyms."

> It was also pointed out that the UPenn "outcome showed that their lexical
> choices in German were older than their morphological choices,..."  so that
> it looks as if "the Germans got their words from Latin and Celtic first and
> their syntax from Slavic later."  Suggesting perhaps that the "problem is
> trickier than it looks."

This sounds like a somewhat mangled version of what Ringe, Warnow, and
Taylor actually said.  First of all, their study did not include syntactic
characters; it included morphological, phonological, and lexical
characters.  The field of linguistics is not in a state of understanding
historical change in syntax that syntactic characters could have been
included.  In any case, the team would most definitely not claim that "the
Germans got...their syntax from Slavic"; I'm sure I'd be representing Don
Ringe correctly to say that he'd vehemently disagree that syntax can be
borrowed.

What the team said up until recent versions of their work is that Germanic
largely agrees with Balto-Slavic in terms of its morphology, but partially
agrees with Italic and Celtic in terms of lexical characters.  What they
posited is that Germanic started out as a _genetic_ sister of
Balto-Slavic, which accounts for the morphological characters on which
they agree; but that it borrowed a number of lexical items from Italic and
Celtic at such an early date that the words had not been detectable as
loans on purely phonological grounds.

However, in even more recent runs of the algorithm over the most recent
version of the character table, the team found that the situation is worse
than that: on different runs, Germanic pops up in different places in the
tree.  If you leave Germanic out, the same tree comes up over and over;
but the placement of Germanic is simply a problem.  This is something of a
mystery at present.

> Does all this reflect the possibility that an accurate and true morphological
> IE tree would and should look different than an accurate and true lexical IE
> tree?

No, because a word is either borrowed or inherited.  The borrowed words
don't tell us anything about the genetic affiliation of a language, so we
try to exclude them.  The mismatch which sometimes occurs between trees
computed over the two types of characters comes from the _methodological_
problem of not always being able to detect what's a loan word.

> I was also given an example that I will try to repeat related to how
> conservatism might affect the usable evidence of relatedness:
> Two IE languages - possibly in contact - accidentially retain a feature from
> PIE.  All other IE languages lose that feature before any records exist.  The
> researcher would be forced to conclude that this feature is a shared
> innovation, having NO WAY OF KNOWING of the PIE origins.  He would have no
> way of knowing that it should go in the "lost" category for the other
> languages.  On that basis, the two languages might have a common "character"
> in the UPenn analysis and show evidence of relatedness.  But in fact all that
> is being measured is the relative conservatism of the two languages.

Ringe, Warnow, and Taylor would code the other cases as "absent"; with a
few exceptions, they generally don't make any claim as to whether this
represents a loss.

> (As far as non-borrowability of "syntactical morphology" goes, I was given
> the example of the overwhelming and extensive use of the original Latinism
> <-tion> in modern English.  And the comment was that if that sort of thing
> showed up in two ancient languages with no recorded history to explain how it
> got there, "some historical linguists would probably say that it had to come
> from the proto-language.")

-tion is a _derivational_ affix; I have not seen the use of the term
"syntactic[al] morphology".  Derivational morphology can be borrowed, but
borrowing of inflectional morphology (e.g. suffixes for verb tenses, noun
case markers, etc.) is virtually unknown.  It's for this reason that
inflectional morphology is so valuable in determining the genetic
affiliation of a language.

The sort of case you bring up can be a problem, but it's not nearly as
hopeless as you make it out to be.  Even without any knowledge of the
history, it's obvious that English has a huge stratum of words borrowed
from Latin, because the words don't show the signs of having gone thru the
Germanic sound changes.  We can often detect this kind of borrowing even
in ancient languages.

> Which brings me back to the point of my original post.  A language borrows
> and innovates radically.  To the point where let's say - to make the point
> clear to the most obtuse degree - 99% of all lexical and morphological
> features are not from the original parent.

A large amount of borrowing can _obscure_ the genetic affiliation of a
language, making it harder for the linguist to determine.  This is a very
different thing from saying that the genetic affiliation has changed.

> This misses the point.  The point is simply evidentiary.  The radically
> changing language could have little or no evidence left of its genetic
> affilation.  CRITICAL EVIDENCE OF THAT 'GENETIC' AFFILATION HAS BEEN LOST.
> (And whether the borrowings happened before or after the sound changes is not
> relevant here - the borrowings are the bulk of the language and all the
> evidence you have.)

> Saying that these attributes fall into the "lost" category on the UPenn grid
> just won't do.  And that's simply because there is no way of knowing if they
> were lost or if they were ever there.  Something that is "lost" looks and
> acts exactly like something that was never there.

> The UPenn tree uses some 300 features across all of IE and some 4000 years.
> Is it possible that the absence of some of those features in some languages
> is not due to recent innovations but losses in other languages?  Is it
> possible that things that are categorized as "lost" in the UPenn  grid were
> never there?

It's for this reason that the team usually uses the neutral term "absent"
rather than "lost".  They do occasionally say "lost", but for the purposes
of their methodology, it makes no practical difference.  Either way, they
code each language where an item is "absent" with a unique character.

It's quite true that these absences (whether due to loss or due to
original absence) represent noise in the data which make it harder to
determine the correct tree.  If you're trying to detemine the correct tree
for a language family, this noise is something that you have to contend
with, no matter what methodology you use.

Despite the noise, the results are encouraging: a very coherent pattern
arises (minus the problem with Germanic): multiple runs of the algorithm
bring up the same tree again and again.

> PS - Someone - I forget who - also wanted to know how the UPenn tree analysis
> could claim to "confirm the Indo-Hittite hypothesize"  when the actual
> language used was Luwian.

Ah- it was you who asked this.  I already answered in a separate post, but
here are the answers:

1) The team did use Hittite, not Luvian, to represent Anatolian.  I don't
know where the idea came from that they used Luvian.

2) It wouldn't matter anyway.  Everyone agrees that Luvian and Hittite are
members of the Anatolian branch of IE.  Despite the misleading
terminology, the "Indo-Hittite" hypothesis holds that the earliest
branching in IE was between _Anatolian_ (not just Hittite) and
Proto-Everything-Else.

  \/ __ __    _\_     --Sean Crist  (kurisuto at unagi.cis.upenn.edu)
 ---  |  |    \ /     http://www.ling.upenn.edu/~kurisuto/
  _| ,| ,|   -----
  _| ,| ,|    [_]
   |  |  |    [_]