X>Y>X

Sat Nov 14 20:34:50 UTC 1998

----------------------------Original message----------------------------
Mark Hubey makes an interesting, visionary and somewhat naive point --
as follows:

>If there are people who are writing programs to paint (yes, produce
>art), and compose music, it takes no genius to see that even if Starostin only
>wrote (or got a student to write) a brute-force, dumb program on a
>commodity grade PC, he can uncover relationships that many humans cannot
>do, even
>if they collaborate.  The reason for this will take too long to explain.
>But given a set of words (and their meanings) even a brute-force
>program can keep cranking 24 hours a day to produce cognates via regular
>sound changes, clusters, and things that a typical linguist does not
>even know exists.

In principle, once someone has established cognate relationships, they can
indeed write an algorithm to derive their original data from their
reconstruction.  But that is the REVERSE of what Mark envisions.  Thinking
about the nature of the discovery procedure imagined by Mark we take all
the words (from  dictionaries -- but shouldn't it be morphemes?) in all the
languages of the world and run all possible correspondences on them
(whatever "all" means in this context, e.g., k =t, k= p , k = a, k = @@ etc
etc etc) and then choose the ones that work out best (semantically,
phonetically and in terms of number of cognates per language pair -- or
whatever, AND apply apppropriate statistical tests to make sure the "best"
is above chance, AND fight like hell against anyone who dares claim
"borrowing", "wander-words" and all that esoterica, in cases where
statistical tests are not overwhelmingly decisive etc etc) and VOILA.  I
wonder if Mark realises what is involved in such a program and how
pitifully simple the chess program that beat the world champion is in
comparison (with its evaluation of 3 gazillion positions per minute or
whatever it was).  Because of what can be done a posteriori, I don't
dismiss the vision out of hand -- but in view of what's really involved,
we'll be arguing the way we do now for generations to come, before anything
remotely resembling Mark's vision emerges for any uses other than
cryptography.

He ends with the usual premise of science fiction writers, and pop accounts
about scientist-heros and their breakthroughs:

>It's too bad that the attitude of most linguists is, in fact, the most
>damaging to themselves and their own professions. But, that is the way
>evolution is. Short term goals and intuition only go so far.

In other words, the "establishment" are the usual and familiar small-minded
villains that hinder progress, and are too vested in their own positions to
recognise a good idea when it is shoved up their ....  Not me, Mark.  I'm
open-minded.  So, explain how this brute-force program works better than I
just did above, and how it addresses all those issues I compressed into my
description.  Can it also decipher Linear A etc. (or do I mean
"translate")?

P.S.  Anybody.  What is the promise of programmed cryptographic methods for
discovering "non-obvious" sound correspondences and-so-on among the lexica
of different languages

P.P.S. As opposed to cryptography, doesn't the programmed "cognate" search
have to start with semantic features (and then the gazillions of brute
attempts at sound correspondences)?  How should a semantic feature search
be organised for big-time time-depths (cf. Johanna's remarks on "five"
different meanings for a cognate, e.g., "fly", "wing", "feather", "fur",
etc.  "five" is a very rough count.  Cognates or assumed cognates exhibit
different DEGREES of semantic resemblance, from obvious to "far-fetched",
e.g., "night", "bump into things in the dark", "get knocked out", "sleep",
"lie down", "sexual intercourse", "AIDS", "dirty needles", "gay", "happy",
etc.  That also has to be modelled for statistical measurement, e.g.,
"wing" is "closer" to "feather" than to "fur", etc. -- but "fur flies", at
least in English.)

PPPS.  Mark wrote:
>...most of the IE words could be due to the substratum which
>could have been a family. One can always insist that the reason why IE
>words resemble each other is because they are all left over from a
>previous language which was spread out over the same region.

Isn't that like saying: The "Odyssey" wasn't composed by Homer but by
somebody else with the same name (or whom we choose to call Homer -- yeah,
I know, but he wasn't blind).  If not that, then Mark is proposing an
extremely complicated and unlikely hypothesis, in which case in the absence
of contrary evidence we choose the simpler one ("science" actually makes
decisions like that -- in the absence of contrary evidence, I said.)

I'm not even sure that I can imagine what the complicated hypothesis is in
this case.  Maybe something like,  IE had fragmented into Germanic and
Slavic etc, but the current languages classified as Gmic, Slvic etc don't
descend from them but just happened to borrow most of their vocabulary and
grammar from them.  So, then, BY WHAT CRITERIA did these languages borrow
so much from their IE neighbors rather than give up their own presumably
non-IE languages and ADOPT those IE languages, so that they are INDEED
descendants of IE.