Cambridge and Greenberg's methods

Fri Aug 27 15:00:43 UTC 1999

On Wed, 25 Aug 1999 ECOLING at aol.com wrote:

> I am not sure what Cambridge's "unrooted trees" are, other than
> as a graph-theoretic term that the direction of change is unspecified,
> because no node is singled out as an "origin".

That is broadly correct.

> In addition to that, the use of unrooted trees may also be a way to
> acknowledge in part the positions of those who suggest we should be
> giving much more consideration to dialect networks, areal phenomena,
> etc. etc. than to binary trees.  It is a perfectly legitimate
> position that we are forced to that at greater time depths, where it
> is harder to distinguish borrowings from genetic inheritances
> (where, at sufficient remove, borrowings actually become genetic
> inheritances for most practical purposes).

But unrooted trees are incompatible with genetic relationships -- or, at
least, they have nothing to say about these.

> In fact, I think the lack of major cleavages in Greenberg's Amerind,
> that is, everything except Athabaskan and Eskimo-Aleut, is virtually
> the same conclusion as having an unrooted tree or dialect network or
> even, WITHIN that more limited context, not linking the families of
> which it is composed at the highest levels.

No.  First, Greenberg *does* recognize major cleavages, even though he
expresses some diffidence about them.

Second, unrooted trees are wholly incompatible with what G thinks he's
doing.  He does rooted trees, and only rooted trees.

> AND, notice, it could also simply be an expression of an inability of
> Greenberg's methods to penetrate deeper, to distinguish at such a depth
> between neighbor-influences such as borrowing and genetic inheritance.
> Perhaps here there really is so much noise that Greenberg's method of
> judgements from data sets cannot yield much.

Well, I have no quarrel with *this*. ;-)

> I do not claim to know. But that is NOT the same as saying I
> conclude anyone should completely discount Greenberg's estimates.

Wny not?

> [LT]

>> Fine, but then G's methods do not suffice to set up language families --
>> even though that is exactly what he does.

> No he does not.

Look.  You don't have to take my word for it.  Ask Greenberg.  He will
tell you flatly that he is setting up language families.  Why do you
keep misrepresenting him so grievously?

Just what do you think Greenberg means when he uses terms like
`Amerind', `Indo-Pacific' and `Khoisan'?

> Greenberg's language families (family trees)
> are an expression precisely of separations just as much as of unions.
> The two are equivalent, under the assumption that we cannot know
> about absolute truth of language relationship.

And this too is emphatically *not* Greenberg's position.

> I will readily admit that Greenberg should have REPEATED more
> often and more clearly that his method assumed ultimate relatedness,
> and merely dealt with different degrees of closeness, that his method
> did not purport to prove relatedness of two particular language families.

Maybe you think that's what he should be doing, but it is definitely not
what Greenberg thinks he is doing.

> [LT]

>> Suppose two languages A and B are genuinely but distantly related.
>> In this case, it is at least conceivable that false positives (spurious
>> matches) would be counterbalanced by false negatives (the overlooking of
>> genuine evidence).

>> But suppose the two languages are not in fact related at all.  In this
>> case, false negatives cannot exist, because there is no genuine evidence
>> to be overlooked.  Hence the only possible errors are false positives:
>> spurious evidence.  And the great danger is that the accumulation of
>> false positives will lead to the positing of spurious relationships.
>> Many of G's critics have hammered him precisely on this point.

> The point I made about random noise had NOT to do with whether
> particular languages are ultimately related, but whether a given language
> or family W is more closely related to others X or Y.
> In that context, why should "random" (by definition) noise in the data
> selectively favor W to X rather than W to Y?  No possible reason that
> I can imagine.

But this only holds good if you assume in advance that all languages are
related.  This is exactly the assumption which I chided you for earlier,
and it is also exactly the assumption which you have just told me in an
off-list posting that you do not hold.  So what is going on?

> Greenberg's methods are much more robust when applied as he applies
> them, to estimating which language families are more closely related,
> than they would be if they were applied to try to yield a conclusion
> about two language families being absolutely related (vs. not related).
> This is almost always misunderstood, and Trask's switch between the
> two kinds of questions in the discussion quoted above
> of the effects of random noise
> seems to indicate that he has not seen this either.

I have switched nothing.  You have.

Greenberg is interested only in absolute relatedness, and he says so.

> Trask seems to approve the Cambridge use of Algorithms,
> and to discount Greenberg's judgements of similarities.
> He calls the one "rigorous" and the other "highly informal".

Correct.

> I don't think either is necessarily better than the other.
> The assumptions built into each can systematically bias the
> results, and such bias will be an increasing problem for BOTH
> with increasing time depth and increasing noise in the data.

But the Cambridge work is mathematical and fully explicit; it can be
tested.  G's work is not and cannot be.  The Cambridge group make it
fully explicit what they are counting and how.  Greenberg does not.
The *only* criterion involved in Greenberg's work is Greenberg's
opinion.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk