Random Noise - quite different questions?

Larry Trask larryt at cogs.susx.ac.uk
Wed Sep 8 08:10:37 UTC 1999


On Fri, 3 Sep 1999, Eduard Selleslagh wrote:

[on John McLaughlin's posting]

> I'm equally intrigued: I would rather expect the amount of spurious
> results to decrease as the number of languages involved increases,
> since the number of chance resemblances, false potential cognates
> etc. (which I would call noise, i.e. meaningless 'results' of the
> comparison) common to all or a significant number of the languages
> involved decreases.  It is simply a matter of the number of
> intersecting sets, mathematically speaking. Or was something else
> intended?

As the number of languages under comparison increases, the number of
spurious "matches" increases much faster, as John has pointed out.
Just how big a problem this is depends on how you proceed.  If, for
example, you accept as a "hit" a match between only two languages, or
only three languages, then increasing the total number of languages
under comparison from, say, five to ten to fifteen to twenty will
virtually guarantee that the spurious matches will soon overwhelm any
real matches that may exist.

You can only deal with this by requiring matches to exist among a
sizeable proportion of all the languages.  If you only accept matches
occurring in, say, 75% of all the languages being compared, then
spurious matches won't be a problem, but you're not likely to find many
genuine matches, either, if any exist, unless the languages are so
closely related that the relationship is obvious upon inspection, in
which case there is no point in undertaking the exercise.

Alexis Manaster Ramer once wrote a paper addressing this issue.
Unfortunately, I have the reference at home at the moment, though I can
dig it out if anyone wants it.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk



More information about the Indo-european mailing list