Random Noise - quite different questions?

ECOLING at aol.com ECOLING at aol.com
Thu Sep 2 03:36:58 UTC 1999


I confess that I do not entirely understand the reasoning used
by John McLaughlin in his message on this subject today.
That is not an oblique criticism, it simply means only what
it says.  I would appreciate if the logic and assumptions were
laid out in greater detail.  I promise not to be offended if some
of it seems exceedingly elementary.

I do attempt one interpretation below, based on the
clues I have, to make sense of it for myself.
But it involves an assumption about Multilateral Comparison
which I do not share.

Because of the following phrasing:

>In other words, we have SIX times as much random
>noise by doubling the number of languages involved in the comparison.

I fear that we are discussing quite different questions.
If Random Noise is expressed as a percentage of the data
available, then it does not increase when there are more languages,
it is by definition constant, at whatever percentage was specified.
So John's reasoning would seem
to require some way of getting results which is not based on
proportions but is based on absolute quantity of noise?
His reasoning would seem to regard the positive data used
by the method of Multilateral Comparison
as the data generated by the random noise,
rather than the data which escaped the random noise.
This seems to suppose that lookalikes which some method
suggests to be plausible potential cognates
(a method which McLaughlin is considering)
are GENERATED by the random noise just as much as they are
survivals recognizable DESPITE random noise.
With criteria for linking lookalikes sufficiently loose to
have no principles behind them, of course we would have
anything compared with anything, and then it would be true.

But my experience is the reverse, so far,
if we are careful to evaluate how strict our judgments
of lookalikes are.
When Alexis Manaster-Ramer proposed a possible
counter-example a year or so ago, saying that Zuni
could be linked with IE just as closely as with Amerind,
I pointed out that the phonetic resemblances he was permitting
when linking Zuni with IE
were in fact much looser
than the ones being permitted to link Zuni with Amerind.
I do NOT believe my judgments were at all colored by
a preference one way or the other, they were simply based
on "nearness" or "minimal steps of change" or "most plausible
steps of change" to get from a common proto-form to the two
items being considered as conceivable cognates,
in Greenberg's sets of lookalike linkages,
and in Manaster-Ramer's sets of lookalike linkages.

All that is needed in this case of Multilateral Comparison is the conclusion
that the steps needed to link some proto-form to both Zuni and other Amerind
are less numerous or rare, on average,
than the steps needed to link Zuni and IE.
That is a fairly precise statement of how Multilateral Comparison
works, but of course with many languages.
Fewer steps or differences equals a closer potential relationship
(as a working hypothesis to be investigated further,
including by other methods)

Still needed is much more work on what are common and rare
phonetic steps, what are common and rare semantic steps,
what are common and rare typological structure steps, etc.

Testing and *calibrating* Multilateral Comparison
even on families we already know to be genetically related
(via the Comparative Method for example)
can actually help us to develop more exact knowledge on these
kinds of steps.

***

To contrast the views on random noise,
let us start with a situation in which there had been only
regular sound changes, but no random replacements which
make identification of cognates any more difficult
than typical regular sound changes would do.

Then we would, in the ABSENCE of random noise,
have a certain number of lookalikes which were good enough
to rank as provisional possible cognates, and to be included
in the tallies for relative closeness or divergence of some
group of languages being considered.

Now add random noise.  It should in this view DECREASE
the number of lookalikes which would be recognizable
to whatever algorithmic or human-judgmental method is
used, by removing some which otherwise would have been
present and found linkable by that method.

If it is truly random, it should not change the RELATIVE
RANKINGS of closeness vs. divergence,
it should only decrease the closeness overall.

So, to summarize, I was calculating the LOSS of information
from random interferences, loss of information which
could be used to identify lookalikes which might actually
turn out after more analysis to be cognates.

***

I think I can interpret what John said by adding a
specification of a different kind.
Supposing John is thinking of a form of Multilateral Comparison
in which any match between any two languages
is regarded as valid, and we do not care what PROPORTION
of the total languages or families being compared show a match
belonging to a particular vocabulary set.

Let me agree at the outset that sometimes Greenberg seems
to do this, or does do it. Please accept the word
"sometimes" here, I am not interested in discussing whether
that "sometimes" is rare or often, because it is a SEPARATE
question NOT inherent to Multilateral Comparison (take note
of the cases in which Multilateral Comparison has been
successfully used in the past, for examples which did not
need to take that very loose approach).
I do not accept that loose approach.

For me, Multilateral Comparison does NOT mean that
one can choose for each vocabulary item one thinks might
be a proto-cognate set, any pair of languages, and not
care what proportion of the languages or families it is
represented in.  That was NOT true of the Multilateral
Comparison done under Catherine the Great, nor I suspect
by Greenberg himself in his African Language Classification.
Rather, we DO care that the vocabulary be represented
"widely" across all members of the putative grouping
(whether called a "family" or "stock" or not).

The problem is that we do not yet have MODELS of the
rate at which such vocabulary representation decreases,
becomes less broad, with increasing time depth,
discussed precisely relative to
this question of the broadness of representation,
vs. representation in, at the extreme, only two languages.
If we pay no attention to how widely distributed a
set of lookalikes are, we are indeed likely to get more
noise included and treated as likely cognates,
where the patterning might suggest if we were omniscient
that the particular lookalikes are not particularly likely
to be cognates.

There is my attempt to understand what I think John
McLaughlin may have been pointing to in his message
today.

In any case, I wish to reinforce that I expect the rate
of tailing off of multiple-language representation
of TRUE cognates (under any reasonable algorithm) to be
a pattern which we CAN study in known cases.
I don't think theoretical calculations are anywhere near
as valuable as the study of known cases, in which
we attempt to measure how much more difficult
matters get as we gradually increase time depth.

Do it for Indo-European, for goodness sake,
where we AGREE that the languages all belong
to one family, and yet MUCH vocabulary
is NOT represented throughout the family.
That does not cause us to doubt the reality of IE.
When pushed by the data pattern, linguists discuss
dialect chains or even networks or areal subparts of IE.
So how much of this should we expect for different
depths of relationship?
Get the statistics on this, formulated in a simple
and objective way which can be compared with
both other known and unknown cases.
across different levels of depth of dialects,
families, family groupings and super-families of IE,
see whether the rate of representation tails off in
a linear, geometric, or other pattern with increasing
depth, and what the range of variation is for different
instances of the "same" time depth (which might
depend on different social situations, such as
the relative isolation of Icelandic, vs. Scandinavian,
vs. mainland Germanic from other closely related
languages of their family).

Best wishes,
Lloyd Anderson



More information about the Indo-european mailing list