X>Y>X

Larry Trask larryt at cogs.susx.ac.uk
Thu Nov 12 16:37:20 UTC 1998


----------------------------Original message----------------------------
On Thu, 12 Nov 1998, H.M.Hubey wrote:
 
> If there are people who are writing programs to paint (yes, produce
> art), and compose music, it takes no genius to see that even if
> Starostin only wrote (or got a student to write) a brute-force, dumb
> program on a commodity grade PC, he can uncover relationships that
> many humans cannot do, even if they collaborate.  The reason for
> this will take too long to explain. But given a set of words (and
> their meanings) even a brute-force program can keep cranking 24
> hours a day to produce cognates via regular sound changes, clusters,
> and things that a typical linguist does not even know exists.
 
No; this is not remotely so.
 
The fundamental problem here, I think, is this.  Such a dumb brute-force
approach is obliged to treat all data on an equal footing.  But the
first thing you learn when you take up historical linguistics is that
you *cannot* treat all data on an equal footing.
 
Anyway, I might point out that just such brute-force programs already
exist, that they have already been developed to a certain level of
sophistication beyond the maximally dumb, and that they have already
been applied to a number of individual cases.  The ones I know most
about are those developed at Cambridge, and these are interesting.
 
However, these programs, interesting as they are, have certain inherent
weaknesses.
 
First, they cannot prove a linguistic relationship.  At best, they can
conclude that a genetic relationship is likely at the confidence level
of 95%, or 99%, or whatever.  And even these impressive-looking levels
generally only arise in cases in which linguists have already
established that a genetic link exists.
 
Second, and more seriously, they cannot distinguish relatedness from
non-relatedness.  If you feed in data from, say, English, Dutch, French
and Chinese, what you get is a tree in which English and Dutch are the
two closest languages, French is somewhat more distantly connected, and
Chinese is more distantly connected still.  That is, the programs cannot
distinguish an unrelated language from a distantly related language.
 
One more thing.  One of these programs has the curious habit of
reporting a strong link between French and Hungarian at the 95%
confidence level or above.  Not IE and Hungarian, you understand: just
French and Hungarian.  Brute force or not, a mere dumb program is
capable of reaching conclusions which any knowledgeable linguist knows
are just plain wrong.
 
Properly designed programs, in the hands of skilful linguists, are
potentially capable of becoming a useful tool -- but certainly not a
replacement for ordinary work in historical linguists.
 
Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK
 
larryt at cogs.susx.ac.uk



More information about the Histling mailing list