From ratcliff at fs.tufs.ac.jp  Fri Sep  1 13:16:51 2000
From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe)
Date: Fri, 1 Sep 2000 09:16:51 EDT
Subject: Q: the 'only six' argument
Message-ID: <FRI.1.SEP.2000.091651.EDT.>


> ----------------------------Original
> message----------------------------
> Larry Trask wrote:
>
> > So, my question: does anybody believe that any version of this
> > statement is valid?  More precisely, do we have a number N and
> > a set of criteria C such that the existence between two languages
> > of N matches satisfying criteria C is enough to guarantee that
> > the languages must be related?

Wasn't this what Donald Ringe was trying to do? (The Factor of  Chance
in Language Comparison, Philadelphia 1992).

But the statement phrased as you have it is certainly not valid.  First
no number of "matches" (sound correspondences?) can *guarantee* that the
languages are related, only that the probability of their being related
is high. Second "related" has to be understood as historically related
rather than genetically related, because numerical criteria only help to
decide the issue chance vs. non-chance similarity, not which type of
historical contingency (descent from a common source or subsequent
contact) may have produced the non-chance pattern. Third there is no
absolute number valid in all cases, because it depends on the size and
nature of the sample being compared. Specifically in the case of sound
correspondences, the bigger the dictionary or word list the more chance
correspondences can be expected; and the smaller the segment inventories
of the languages compared the more chance correspondences can be
expected. This is because the average expected number of chance
occurences of an event (in this case a correspondence at a given
position in a word) is the probablility of the event (in this case the
relative frequency in the given position of the segments compared
multiplied by each other) times the number of trials (in this case the
number of semantically equivalent words available for comparison).

So if you have two languages A and B, both of which have only ten
consonants evenly distributed in word first position, and you have an
A-B dictionary which has 10,000 entries correlating one word in A with
one and only one semantic equivalent in B, with no synonyms in either
langauge, you'd expect to find about 100 matches between any first
consonant in A and any first consonant in B (chance that x will occur as
first consonant in A: 1/10, multiplied by chance that y will occur as
first consonant in B: 1/10, multiplied by  total places where 1st C of a
word in A can be compared with 1st C of word in B: 10,000). So you
wouldn't be justified in suspecting a historical relationship till you
got a good bit over a 100 matches.  On the other hand if you had two
languages with 25 consonants evenly distributed and a lexicon based on
a1000 word random sample, you'd expect an average of only 1.6 first
consonant matches (1/25 * 1/25 * 1000). So you'd be justified to suspect
a non-accidental, hence historical relationship even with as few as 4 or
5 matches.


 Bobby D. Bryant wrote:

>
>
> In short, I don't think such a formalization of the problem in terms
> of N and
> C is going to work in practice.  At some level you are always going to
> have
> to pile on enough examples to convince your peers, which is of course
> the way
> things have always worked.
>

Piling on enough examples to convince your peers no longer works in
practice, or else the long-distance comparison debates would not have
become as acrimonious as they have. Formalizing the problem seems to me
to be the only way forward. Besides, isn't that where the joy of
research lies-- in ever sharpening and refining our understanding of our
subject matter and of the tools we use to analyze it?


-- -----------------------------------------------------------
Robert R. Ratcliffe
Dept. of Linguistics and Information Science
Tokyo University of Foreign Studies
Asahi-machi 3-11-1,
Fuchu-shi, Tokyo
183-8534 Japan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000901/0482ac27/attachment.html>

From jer at cphling.dk  Sun Sep  3 14:19:19 2000
From: jer at cphling.dk (Jens Elmegaard Rasmussen)
Date: Sun, 3 Sep 2000 10:19:19 EDT
Subject: Q: the 'only six' argument
In-Reply-To: <39AE4FFF.1750AFC5@mail.utexas.edu>
Message-ID: <SUN.3.SEP.2000.101919.EDT.JER@CPHLING.DK>

----------------------------Original message----------------------------
Dear List,

I have occasionally shocked my students by insisting that ONE *probative*
example is enough to prove the point for which it is probative. The
statement, of course, is tautological: If the examples did NOT prove its
point, it would not be probative, for that's what the word probative
means.
   The consequence is that, e.g., in Indo-European, certain disputed
groupings MUST be accepted unless we are willing to swallow very awkward
camels: If the Celtic superlative in *-isamo- and the Italic one in
*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo-
[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other
adjectives), and one cannot be assumed to have been borrowed from the
other (would you borrow a new form of the superlative, if your language
has a perfectly good one already?), then there WAS an Italo-Celtic node in
the splitting-up of the IE unity.  Similar arguments could be set up for
some of the points uniting Baltic and Slavic which look strong enough in
themselves to carry the burden of proof even if they were not supported by
others.
   Nice to see the list blossoming again.

   Jens E. Rasmussen


From hwhatting at hotmail.com  Mon Sep  4 12:46:49 2000
From: hwhatting at hotmail.com (Hans-Werner Hatting)
Date: Mon, 4 Sep 2000 08:46:49 EDT
Subject: Q: the 'only six' argument
Message-ID: <MON.4.SEP.2000.084649.EDT.HWHATTING@HOTMAIL.COM>

----------------------------Original message----------------------------
On Sun, 3 Sep 2000 10:19:19 EDT, J. E. Rasmussen wrote:
>I have occasionally shocked my students by insisting that ONE *probative*
>example is enough to prove the point for which it is probative. The
>statement, of course, is tautological: If the examples did NOT prove its
>point, it would not be probative, for that's what the word probative
>means.
>    The consequence is that, e.g., in Indo-European, certain disputed
>groupings MUST be accepted unless we are willing to swallow very awkward
>camels: If the Celtic superlative in *-isamo- and the Italic one in
>*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo-
>[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other
>adjectives), and one cannot be assumed to have been borrowed from the
>other (would you borrow a new form of the superlative, if your language
>has a perfectly good one already?), then there WAS an Italo-Celtic node in
>the splitting-up of the IE unity.  Similar arguments could be set up for
>some of the points uniting Baltic and Slavic which look strong enough in
>themselves to carry the burden of proof even if they were not supported by
>others.

Languages don4t only borrow words or formations because they don4t have an
adequate expression for a concept. Simply imitating a formation seen as more
expressive or the usage of a language which is seen as more prestigious also
plays a role. A good example from modern German is the borrowing of the
English way of expressing the year in which an event happened. The
traditional way in German is to say "Es geschah 1999.", but now quite often
one can find "Es geschah in 1999.", which is a clear calque on English. The
reason behind this is, of course, the big prestige of the English language,
its far-spread knowledge, and also that this formation is more expressive
than the traditional German one. A superlative formation seems to be a good
candidate for borrowing on grounds of expressiveness.
I don4t want to say that the superlative formation quoted cannot serve as
proof for Italo-Celtic unity. But if there is only one example (in this
case, of course, there are more than one, but the evidence is still
inconclusive), one can never exclude borrowing. The only thing it proves is
that the speakers of Proto-Celtic and Proto-Italic have been living close
enough to borrow from one another.
I would like to add the following to the general discussion:
1.) No quantity of matches can ever prove genetic relationship. One can
probably find thousands of matches between, e.g., French and English or
Latin and Albanian, without Albanian or English being Romance languages.
2.) There is, as far as I knoe, some sort of communis opinio on that certain
matches (from basic vocabulary, grammatical morphemes) are more important
for proving genetical relationship than others.
3.) I would recommend that if one has collected one4s matches, one should
try a reconstruction. If the results are a decent basic vocabulary, and a
basic common grammar, the languages examined are most probably genetically
interrelated. There4s of course the question how to define "decent basic
vocabulary" and "basic common grammar", and that4s (besides the
questionableness of many matches) the main problem for wide-range
reconstructions like Nostratic, Proto-World etc. Anyone interested in
formulating some minimalist criteria?
4.) Always look at the history behind the matches. Are their historical
links between the carriers of the respective languages, and of which kind
are they? This is of course impossible if the history is not known, and if
one wants to use language to reconstruct history.
--
Essentially, I think a numerical approach does not take us very far. The
most important question seems to me, can we reconstruct a system based on
the matches, and what does it look like? If we get a basic grammar and basic
vocabulary, there are strong reasons to suspect genetical relationship; if
we get (say) a group of religious words, we can assume borrowing based on
religious influences, and so on. Here, of course, numbers play a role - one
simply needs a sufficient number of matches to constitute a system. But if
we have to small a number of matches to form a convincing system, only
historical evidence can help.

Best regards,
Hans-Werner Hatting, mag. phil.


_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at
http://profiles.msn.com.


From r.rankin at latrobe.edu.au  Mon Sep  4 12:40:49 2000
From: r.rankin at latrobe.edu.au (R. Rankin)
Date: Mon, 4 Sep 2000 08:40:49 EDT
Subject: Q: the 'only six' argument
Message-ID: <MON.4.SEP.2000.084049.EDT.R.RANKIN@LATROBE.EDU.AU>

----------------------------Original message----------------------------
Larry Trask wrote:

> Quite often, in my reading, I've come across a statement of the
> following type:
>         "The presence of only six good matches between two languages
>         is enough to show that the languages must be genetically related."
> ... the number is always different.  Six is the smallest I've ever seen, but
> I've also seen 15, 50 and various other numbers.

I don't recall often seeing such claims, but it is probable that I just
disregarded them and read on.  Personally, I'm very skeptical about the
possibilty of developing any airtight criteria for genetic relationship that
will work cross-linguistically.  This sort of thing has to be done on a
case-by-case basis.  Factors may include various structural considerations
(phonological, morphological and lexical), likelihood of creolization,
likelihood of participation in a Sprachbund, etc.

Meillet is said to have remarked that one could tell if a language were
Indo-European or not just by examining the conjugation of the verb 'be'
(though at my present location I cannot give you a citation).  I tend to agree
with his insistence on morphological criteria, but still think there are far
too many potential variables for us to permit ourselves to be dogmatic.
Noodling around with "universal criteria" is an enterprise for synchronists;
we should not let ourselves be seduced into trying it in genetic linguistics.

Bob Rankin

--
Robert L. Rankin, Visiting Fellow
Research Center for Linguistic Typology
Institute for Advanced Study
La Trobe University
Bundoora, VIC 3083 Australia

Office: (+61 03) 9467-8087
Home:   (+61 03) 9499-2393


From degraff at MIT.EDU  Tue Sep  5 09:57:18 2000
From: degraff at MIT.EDU (Michel DeGraff)
Date: Tue, 5 Sep 2000 05:57:18 EDT
Subject: Q: the 'only six' argument
In-Reply-To: Your message of "Mon, 04 Sep 2000 08:46:49 EDT."
 <F159cIfLuEJmMcMHvPd00002a0d@hotmail.com>
Message-ID: <TUE.5.SEP.2000.055718.EDT.DEGRAFF@MIT.EDU>

----------------------------Original message----------------------------

Holding humbly and tightly on my creolist-cum-syntactician hat, I would
like to inquisitively and constructively piggy-back on Hans-Werner
Hatting's observations and questions regarding (alleged) criteria for
genetic relatedness.

> 1.) No quantity of matches can ever prove genetic relationship. One can
> probably find thousands of matches between, e.g., French and English or
> Latin and Albanian, without Albanian or English being Romance languages.

In a similar vein, note that the etymology of Haitian Creole---a (so
called) "non-genetic" language---is overwhelmingly French while the lexicon
of Modern English---a (so called) "genetic" language---is mostly
non-Germanic etymologically.  Besides, virtually all Haitian Creole affixes
have cognates in French affixes whereas English has many affixes of
non-Germanic origins.

By the way, the latter observation about Haitian Creole suffices to
falsify all these `classic' Creole-genesis scenarios that posit a
affixless-pidgin phase a la Jespersen, Bickerton, McWhorter, Seuren, etc.

> 2.) There is, as far as I knoe, some sort of communis opinio on that certain
> matches (from basic vocabulary, grammatical morphemes) are more important
> for proving genetical relationship than others.

Virtually all of Haitian Creole's grammatical morphemes are etymologically
French.

> 3.) I would recommend that if one has collected one's matches, one should
> try a reconstruction. If the results are a decent basic vocabulary, and a
> basic common grammar, the languages examined are most probably genetically
> interrelated. There's of course the question how to define "decent basic
> vocabulary" and "basic common grammar", and that's (besides the
> questionableness of many matches) the main problem for wide-range
> reconstructions like Nostratic, Proto-World etc. Anyone interested in
> formulating some minimalist criteria?

Given what I've noted above vis-a-vis lexicon and morphology, it then seems
that *absence* of "basic common grammar" would be *the* structural
criterion for claiming that Creole languages such as Haitian Creole are
"non-genetic" languages that arose via "abnormal transmission" whereas
French, say, is a "genetic" language that arose via "normal transmission".

Let me try and be more precise as to what I think are the implications of
an hypothetic "basic common grammar" with respect to the
genetic-vs-non-genetic hypothesis as it applies to, say, Haitian Creole
vs. French. Whatever features define this "basic common grammar", these
features must diverge when comparing the grammars of (colloquial) 17th-18th
century French dialects to that of Haitian Creole, and such divergences
must be *qualitatively* different than their counterparts in the
("genetic") course of French diachrony.

So far, I have not be able to isolate such features. Whatever divergences
exist between colloquial 17th-18th century French dialects and Haitian
Creole (e.g., `loss' of verbal inflection, verb-placement differences, etc.)
seem to have counterparts in the diachronic course of `genetic' languages.
And what I find most intriguing is that such divergences in `genetic'
diachrony also seem to coincide with the history of contact within these
`genetic' diachronies.  This was, of course, noted by Meillet, although he
would most likely not agree with the conclusions I seem drawn to.

In any case, if the "basic common grammar" remains elusive, then perhaps
it's time to seriously (re-)challenge the alleged (non-)genetic dichotomy
between Creole and non-Creole languages and/or the very concept of "genetic
relatedness" as a linguistically (i.e., *structurally*) definable concept.

Then again, I still need to learn more about the structural basis of
genetic linguistics.  This, I look forward to.

                                 -michel.
___________________________________________________________________________
MIT Linguistics & Philosophy, 77 Massachusetts Ave, Cambridge MA 02139-4307
degraff at MIT.EDU        http://web.mit.edu/linguistics/www/degraff.home.html
___________________________________________________________________________


From larryt at cogs.susx.ac.uk  Tue Sep  5 09:59:09 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Tue, 5 Sep 2000 05:59:09 EDT
Subject: Sum: the 'only six' argument
Message-ID: <TUE.5.SEP.2000.055909.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
I was planning to post a summary of the responses to my query
last week about the 'only six' argument.  However, after the
first few respondents replied to me privately, the responses
shifted to the list, and so all of you will now have seen
most of the responses already.

I will therefore content myself with reporting that no one
who has so far replied has expressed any great sympathy with
any version of the 'only six' argument, and several people
have been openly hostile.

These negative responses don't surprise me at all.  I am
certainly not sympathetic to the 'only six' argument.  It's
just that I keep coming across claims of this sort every now
and again, and I was beginning to wonder if a significant
number of historical linguists were embracing such arguments.
Apparently not.

Anyway, I hope we may continue the discussion on the list,
so long as Dorothy is willing.  My mail spool has been rather
short of interesting historical discussions since the IE list
suddenly collapsed last April.

My thanks to everyone who has replied.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From larryt at cogs.susx.ac.uk  Fri Sep  8 12:18:58 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Fri, 8 Sep 2000 08:18:58 EDT
Subject: Q: German Forst 'forest'
Message-ID: <FRI.8.SEP.2000.081858.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
This is an etymological question.

English 'forest' is, of course, borrowed from Old French,
where it goes back to Late Latin <forestis (silva)> 'outer forest',
with the first element possibly from <foris> 'outside'.

I had always assumed that German <Forst> 'forest' had the same
origin.  But, on checking, I find that things are more complicated.

Some sources agree that the German word is of the same origin
as the English one.  But other authorities, including Kluge,
give a quite different etymology.  They derive <Forst> from an
unrecorded *<forhist>, a derivative of Old High German <foraha>
'fir tree' (modern <Föhre>), with a semantic shift 'fir forest' >
'conifer forest' > 'forest'.  Davis, in his English edition of Kluge,
observes that opinion is divided on this etymology.

Just to complicate things, Middle High German had a word <forest>
'forest', which even the proponents of Kluge's etymology seem to
agree is derived from Latin and unrelated to modern <Forst>.

So, my question is this.  Is there now general agreement on the
etymology of <Forst>?  Or is the question still up in the air?

I ask because, if the Germanic etymology of <Forst> is confirmed,
then 'forest' and <Forst> constitute one of the most wonderful
chance resemblances I have ever seen -- right up there with
English 'much' and Spanish <mucho> 'much', and English 'bad' and
Persian <bad> 'bad'.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From paoram at unipv.it  Sat Sep  9 19:04:21 2000
From: paoram at unipv.it (Paolo Ramat)
Date: Sat, 9 Sep 2000 15:04:21 EDT
Subject: R:      Q: German Forst 'forest'
Message-ID: <SAT.9.SEP.2000.150421.EDT.PAORAM@UNIPV.IT>

----------------------------Original message----------------------------

-----Messaggio originale-----
Da: Larry Trask +ADw-larryt+AEA-cogs.susx.ac.uk+AD4-
A: HISTLING+AEA-VM.SC.EDU +ADw-HISTLING+AEA-VM.SC.EDU+AD4-
Data: sabato 9 settembre 2000 1.55
Oggetto: Q: German Forst 'forest'


+AD4-----------------------------Original message----------------------------
+AD4-This is an etymological question.
+AD4-
+AD4-English 'forest' is, of course, borrowed from Old French,
+AD4-where it goes back to Late Latin +ADw-forestis (silva)+AD4- 'outer forest',
+AD4-with the first element possibly from +ADw-foris+AD4- 'outside'.
+AD4-
+AD4-I had always assumed that German +ADw-Forst+AD4- 'forest' had the same
+AD4-origin.  But, on checking, I find that things are more complicated.
+AD4-
+AD4-Some sources agree that the German word is of the same origin
+AD4-as the English one.  But other authorities, including Kluge,
+AD4-give a quite different etymology.  They derive +ADw-Forst+AD4- from an
+AD4-unrecorded +ACoAPA-forhist+AD4-, a derivative of Old High German
 +ADw-foraha+AD4-
+AD4-'fir tree' (modern +ADw-F+APY-hre+AD4-), with a semantic shift 'fir forest'
 +AD4-
+AD4-'conifer forest' +AD4- 'forest'.  Davis, in his English edition of Kluge,
+AD4-observes that opinion is divided on this etymology.
+AD4-
+AD4-Just to complicate things, Middle High German had a word +ADw-forest+AD4-
+AD4-'forest', which even the proponents of Kluge's etymology seem to
+AD4-agree is derived from Latin and unrelated to modern +ADw-Forst+AD4-.
+AD4-
+AD4-So, my question is this.  Is there now general agreement on the
+AD4-etymology of +ADw-Forst+AD4-?  Or is the question still up in the air?
+AD4-
+AD4-I ask because, if the Germanic etymology of +ADw-Forst+AD4- is confirmed,
+AD4-then 'forest' and +ADw-Forst+AD4- constitute one of the most wonderful
+AD4-chance resemblances I have ever seen -- right up there with
+AD4-English 'much' and Spanish +ADw-mucho+AD4- 'much', and English 'bad' and
+AD4-Persian +ADw-bad+AD4- 'bad'.
+AD4-
+AD4-
+AD4-Larry Trask
+AD4-COGS
+AD4-University of Sussex
+AD4-Brighton BN1 9QH
+AD4-UK
+AD4-
+AD4-larryt+AEA-cogs.susx.ac.uk
+AD4-
+AD4-Tel: 01273-678693 (from UK)+ADs- 1273-678693 (from abroad)
+AD4-Fax: 01273-671320 (from UK)+ADs- 1273-671320 (from abroad)

+ACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgA
 qACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKg
 AqACoAKgAqACoAKg-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
 +-+-+-
+AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA
 9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQ-

Dear Larry,
the etymology of Germ. +AF8-Forst+AF8- proposed also in the 23rd. ed. of Kluge's
Etym. Wtb. der dt. Spr.
(by E. Seebold, 1995) sounds rather unconvincing. From a Gmc.
 +ACoAXw-forhist+AF8-
+ACI-Gehegtes+ACI- we should have  MHG +AF8-foerhest+AF8- (with Umlaut) and NHG
 +ACoAXw-foerst+ACI-,
just as we get +AF8-lengest+AF8-(+ADwAKgBf-langisto+AF8-),
 +AF8-ermest+AF8-(+ADwAKgBf-armisto+AF8-) etc., and NHG
+AF8-laengst+AF8-, +AF8-aermst+AF8-
Moreover, +AF8-Forst+AF8- seems not to be ProtoGmc.: it is attested in Germ. and
Dutch (+AF8-vorst+AF8-) only . Thus I think you are right: the chance that we
 have
here a loanword from Latin seems more plausible than the other hypothesis.
Also De Vries, Nederl. etymol. Woordenb., says that OHG +AF8-forst+AF8- ( ca.
 800)
may derive from MLat. +AF8-forestis+AF8- , +ACI-reeds in 648 in een oorkonde
 voor
Stavelot-Malm+AOk-dy+ACI-.

Best,
Paolo


From ratcliff at fs.tufs.ac.jp  Sat Sep  9 19:05:04 2000
From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe)
Date: Sat, 9 Sep 2000 15:05:04 EDT
Subject: Sum: the 'only six' argument
Message-ID: <SAT.9.SEP.2000.150504.EDT.>

 Larry Trask wrote:

> I will therefore content myself with reporting that no one
> who has so far replied has expressed any great sympathy with
> any version of the 'only six' argument, and several people
> have been openly hostile.
>
> These negative responses don't surprise me at all.  I am
> certainly not sympathetic to the 'only six' argument.  It's
> just that I keep coming across claims of this sort every now
> and again, and I was beginning to wonder if a significant
> number of historical linguists were embracing such arguments.
> Apparently not.

Wait just a second there. I may have sounded negative myself. But when I
thought about it a little more, I realized that there is a legitimate
and  interesting argument there, and it ought to be in historical
linguistics textbooks if it isn't. ( I don't know if this is the
argument you have seen, but I'd be interested to know if it IS any
textbooks?).

Basically, IF one has set up the question properly and IF one has
carried out the comparison with discipline and honesty (big ifs, of
course), then a very small number of examples of a single sound
correspondence is sufficient to demonstrate a historical (not
necessarily a genetic) relationship beyond any reasonable doubt.
Practically speaking, given the sample sizes we usually work with and
the way that phonological systems are set up, in most cases, the
necessary number is indeed around six or not much more. This isn't
anything for anyone to be hostile to (or sympathetic to, for that
matter); it simply follows necessarily from the logic of probability.
I'll explain, but first a clarification.

When you ask about numerical criteria for a genetic relationship, you
are asking (at least) two separate questions. Most of the respondents
addressed the second question-- what are the criteria for determing if
two historically related languages are related genetically-- as opposed
to being related by contact or borrowing, or by being in a
lexifier-creole relationship. Some respondents addressed the question of
what criteria are relevant for subclassifying genetically related
languages. As far as I can see (and as most of the respondents said),
numerical criteria simply are not relevant for making these kinds of
judgements. It's the nature of the similarities or commonalities, not
the number of them that count. In any case probability theory doesn't
come into play because in all these cases we have already ruled out
coincidence as an explanation.

But when approaching unclassified languages or languages which haven't
been compared to each other before, the first question we have to ask is
whether these languages have something in common which cannot be due to
chance or coincidence. Numerical criteria and probability theory are the
most reliable means for making judgements of this type.

 Here's how you end up with only six: First the average expected number
of chance matches between any two consonants in any two languages (that
is the expected number of times the consonants will appear in the same
position in a word with the same meaning) is the frequency of the first
consonant in its language times the frequency of the second consonant in
its language times the number of word pairs available for comparison.
Thus if ten percent of the words start with /t/ in one language and ten
percent of the words in the other language start with /b/ then in a
hundred word sample, there should be (by chance) one case where the
translation of a word starting with /t/ in the first language starts
with  /b/ in the second.  In a 1000 word sample there should be about
ten such cases. One rough guide to frequency of a consonant is simply 1
over the number of consonants in the inventory. So if you have twenty
consonants the average frequency of each consonant is 1/20 or .05. If
you have a Macintosh with a graph calculator try entering this formula
1/x^2*n100 (one over x squared times n times 100). This gives you the
expected number of correspondences, in a sample with n100 word pairs, of
two languages both with x number of consonants, evenly distributed. You
can see from this that as long as the average size of the consonant
inventory is greater than 10 (or put another way, where no consonant
occupies more than ten percent of the word positions being compared) the
expected number of chance matches in a 100 wd sample is between 1 and 0.
That is in a 100 word sample you expect that each consonant (in initial
position) in one language will match up with each consonant in the other
in one word or not at all. In a 1000 word sample the expected chance
avgs. are not all that much higher-- basically if the average size of
the consonant inventories is 14 (or the avg. frequency no more thant
1/14), you only expect to get 5 chance correspondences, though below 14
the expected number starts to climb dramatically. (At 5 the expected
number is 40).

 The next question is how far above the average do we have to get before
coincidence becomes an absurdly unlikely explanation. There is a formula
for this, but I won't go through it since this post has gotten long. But
here is one example: In the case where two langauges both have 20
consonants evenly distributed (or more realistically in comparing two
consonants in two languages both of which have a frequency of 5% in the
word-position being compared in their respective languages), the
probabilty of finding more than 5 correspondences (i.e. 6 or more) in a
100 wd. sample is 0.000000356, or roughly 1 in 2.8 million. (The chance
of finding 5 or more is roughly 1 in 163,000.) So in this set of
circumstances "6 or more" (i.e a single correspondence set occuring in a
given position-- say word-initially-- in 6 or more words) should be
pretty well conclusive for demonstrating a non-chance and hence almost
certainly historical (genetic or contact) relationship.

 I think that working all this out mathematically is interesting and
important for compartive linguistics for two reasons. First it means
that if you apply the comparison strictly (allow only one-to-one word
comparisons, and one-to-one phoneme comparisons) you can get more
knowledge from less information-- you can potentially demonstrate a
relationship with much less data than comparativists have traditionally
thought necessary. This is important to me, because I work in
Afroasiatic, where the perpetual concern is exactly how to get more
knowledge with less information (few old texts for most langauges).

    But the other side of this is that the mathematics makes it
perfectly clear that if you relax the semantic and phonemic criteria far
enough, you quickly come to a point where the expected number of chance
correspondences becomes so high, that it becomes practically impossible
to mount an effective demonstration of a relationship. The relevant
parameters are number of comparisons and frequency of consonants. If you
allow for comparison of each word with a wide range of semantically
close words you multiply the number of comparisons and effectively
increase the sample size. (A pair of 1000 wd-lists with one-to-one
matching is the same mathematically as two 100 wd. lists with each word
compared with 10 words in the other language-- both give 1000 pairs or
trials).  Going back to the previous example with frequency of 5% for
each consonant the number of matches you need to get to the 1 in a
million or better range for different samples sizes are: 200-8, 500-10,
1000-14, 2000-19. In other words although the average number of expected
chance correspondences increases geometrically with sample size, the
number needed for reasonable certainty of non-chance goes up at a higher
rate. If you are considering each word in a 1000 wd list against 20 or
30 semantically close words, the effective sample size-- and hence the
number of matches needed to demonstrate a non-chance relationship--
becomes gigantic. (I don't have a calculator powerful enough to
calculate it though, sorry.) Similarly If you allow many-to-many phoneme
matchings, you effectively increase the frequency. If you compare two
systems of 15 consonants at 3 points of articulation one-to-one the
chance of a match is on average 1/15 squared. The expected number of
chance matches in a 1000 word sample is between 4 and 5 (4.44)--
reasonable. The chance of matching any two consonants at the same point
of articulation  is 1/3 squared. In a 1000 wd. sample the expected
number of chance matches is 111-- a big jump.

Thus with very loose criteria, the comparatist is in the paradoxical
position of having to prove the existence of hundreds of "bad" (random)
correspondences in order to have any confidence of having found in any
good ones (ones which actually reflect language history). And if there
really are any good correspondences, the problem of how to pick them out
from all the random "noise" which is certain to be there is daunting.


>

  -- -----------------------------------------------------------
Robert R. Ratcliffe
Associate Professor, Arabic and Linguistics,
Dept. of Linguistics and Information Science
Tokyo University of Foreign Studies
Asahi-machi 3-11-1,
Fuchu-shi, Tokyo
183-8534 Japan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000909/dfa3946d/attachment.html>

From kroch at change.ling.upenn.edu  Thu Sep 14 01:33:29 2000
From: kroch at change.ling.upenn.edu (Tony Kroch)
Date: Wed, 13 Sep 2000 21:33:29 EDT
Subject: Announcing the second edition of the Penn-Helsinki Parsed Corpus
 of Middle English
Message-ID: <WED.13.SEP.2000.213329.EDT.KROCH@CHANGE.LING.UPENN.EDU>

----------------------------Original message----------------------------
The second edition of the Penn-Helsinki Parsed Corpus of Middle English
(PPCME2) is now publicly available under the conditions outlined
below. It consists of 55 text samples containing 1.3 million words of
syntactically annotated Middle English prose and ranging over four
time periods, from 1150 to 1500.

Like the first edition of the PPCME, the PPCME2 is based on the
Middle English portion of the Helsinki Corpus of English Texts that
was created at the University of Helsinki under the direction of
Matti Rissanen and Ossi Ihalainen. The size of the text samples in
the second edition has been enlarged so that the total corpus size is
nearly three times larger. In addition, the corpus is now tagged for
part of speech and the syntactic annotation system is richer.
For the earliest time period, all texts except one are complete; the
exception is the Ancrene Riwle sample, which contains approximately
50,000 words. For the later time periods, two texts per time period
were expanded
to approximately 50,000 words.  The remaining texts are represented
by the Helsinki Corpus sample.

The PPCME2 is being distributed on a CD-ROM that includes several files
for each text in the corpus:

        - a file with unannotated text
        - a file with philological and other information about the text
          (manuscript and edition used, date, dialect, genre, and word count
          of the sample)
        - a file in which individual words are tagged for part of speech
        - a file that is annotated for syntactic structure

Available with the corpus is CorpusSearch, a Java program written by
Beth Randall that runs under Unix, Linux, MacOS and Windows.
CorpusSearch uses standard syntactic predicates like ``(immediately)
precedes'', ``(immediately) dominates'', and Boolean combinations
thereof, and it allows outputs of previous search as inputs to
further searches.

To order the PPCME2, please go to http://www.ling.upenn.edu/mideng and
follow the instructions there.

The cost of a subscription to the corpus is $200 and the cost of a
license for CorpusSearch is $50.  The items may be purchased together
or separately.  Proceeds from the sale of the corpus will pay for
improving the corpus and for increasing its size over time. Proceeds
from the sale of CorpusSearch will go to the author.

The PPCME2 was designed and built by Anthony Kroch and Ann Taylor at the
University of Pennsylvania.  Supplementary assistance was provided by
Beatrice Santorini.  The PPCME2 is part of of a larger project to produce a
parsed diachronic corpus of English from 800 to 1800.  The Old English part
is under construction at York under the direction of Anthony Warner, Susan
Pintzuk, and Ann Taylor and the Early Modern English part is under
construction at the University of Pennsylvania under the direction of Kroch
and Santorini.


From Ann.Kumar at anu.edu.au  Thu Sep 14 11:01:01 2000
From: Ann.Kumar at anu.edu.au (Ann Kumar)
Date: Thu, 14 Sep 2000 07:01:01 EDT
Subject: "only six" argument
Message-ID: <THU.14.SEP.2000.070101.EDT.ANN.KUMAR@ANU.EDU.AU>

We have been following the HISTLING discussion initiated by Larry Trask
with interest, because we have been involved over the last two years in a
particular case that had to solve the problem of the amount of data that is
necessary to establish relatedness. (Not genetic, but via borrowing).  We
have been doing what Robert R. Ratcliffe takes as his starting point in his
last e-mail, i.e. "approaching unclassified languages or languages which
haven't been compared before [where] the first question we have to ask is
whether these languages have something in common which cannot be due to
chance or coincidence."  The results will be published in the December
issue of Oceanic Linguistics, but we thought it might interest LIST members
to have a sneak preview, at least of the (rahter long) section on
probability, where we discuss relevant issues.  (The section is attached.)


We were trying to find out whether some semantic and phonological matches
in Old Japanese and Old Javanese lexis were too extensive to be due to
chance. In this particular case, rather than looking at single sound
correspondences, we used whole-word comparison, and of longer words (CVCVC
structure) with recurrent sound correspondences.  While it is not possible
to go into the calculations here, it turned out that in this case only one
match between words of this length could be expected to occur by chance. In
the section on probability Rose discusses the usefulness of the approaches
taken earlier by Nichols and Ringe and goes on to propose that a Bayesian,
rather than frequentist, statistical approach should be the preferred
option. We have attached this section.

We agree with Ratcliffe that "Numerical criteria and probability theory are
the most reliable means for making judgements of this type". But we are
able to demonstrate a few more things that might interest LIST readers,
(and can also offer some real data!).  As mentioned, we also have some
points to make concerning the appropriateness of the frequentist (as
opposed to a Bayesian) paradigm for evaluating questions of this kind (i.e
assessing the probability of a hypothesis). (Bayesian formulations are
used, for example, in forensics. We don't know to what extent historical
linguistics are aware of them, so we offer them in case people are
interested.)

Ann Kumar
Phil Rose


-------------- next part --------------
A non-text attachment was scrubbed...
Name: short_prob.doc
Type: application/mac-binhex40
Size: 71221 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000914/da56d2a7/attachment.hqx>
-------------- next part --------------
 ===========================================================================
Dr Ann Kumar
Vice-President, Australian Academy of the Humanities
Centre for the Study of Asian Societies and Histories
Faculty of Asian Studies
Canberra ACT 0200
Australia
Tel. (02) 6249 3677/4658  fax. (02) 6279-8326


From X99Lynx at aol.com  Mon Sep 25 14:55:08 2000
From: X99Lynx at aol.com (Steve Long)
Date: Mon, 25 Sep 2000 10:55:08 EDT
Subject: Superlative Forms and Swallowing Camels
Message-ID: <MON.25.SEP.2000.105508.EDT.>

----------------------------Original message----------------------------
On Sun, 3 Sep 2000 10:19:19 EDT. jer at cphling.dk wrote:

<<The consequence is that, e.g., in Indo-European, certain disputed groupings
MUST be accepted unless we are willing to swallow very awkward camels:...
[e.g.,] the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo-
... cannot be assumed to have been borrowed from the other (would you borrow
a new form of the superlative, if your language has a perfectly good one
already?)....>>

(Hi, Jens!)
I must ask of course how we know that one language or the other already had
"a perfectly good form of the superlative?"

With all due respect to the writer, from whom I've already learned a great
deal, I must ask whether the case is as clear cut as he perceives it.  This
camel might be the kind you find in animal cracker boxes -- bite-sized.

Ironically, two relevant languages make no morphological distinction between
the comparative and the superlative - Manx and French.  If this says nothing
else, it proves that languages can find themselves without any form of the
superlative, much less "a perfectly good one."

Whatever forces caused the loss of the superlative in those languages may
have caused an earlier loss in either Celtic or Italic.  And that would have
meant one or the other of those two languages may have been in need of a
superlative form and therefore had a very good reason to borrow it.

And doesn't the question <<would you borrow a new form of the superlative, if
your language has a perfectly good one already?>> work both ways?  Why would
"Italo-Celtic" innovate a superlative form when they already had a perfectly
good one?

In my mind this raises again the question of how one distinguishes between a
borrowing and descent from a common ancestor, IF the word or form is actually
old
enough to predate indicia of borrowing.

Also, there are those of us who suspect that going back 4000+ years creates a
great deal of uncertainty about what languages -- both IE and non IE -- the
form could have been borrowed from.

The reconstruction the author offers -- "the Celtic superlative in *-isamo-
and the Italic one in *-is(s)amo- cannot be imagined to be parallel
developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto
with other adjectives)" -- does not foreclose the possibility that
development is one that occurred in some third language (or the specialized
dialect of an influential, itinerant linguistic community -- like scribes or
priests) and that both Latin and Celtic "borrowed" it independently.

And finally, why would a language borrow a word like "superlative" when
presumably back in the days of Old English, it "already had a perfectly good
one?"

Regards,
Steve Long


From larryt at cogs.susx.ac.uk  Tue Sep 26 14:35:03 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Tue, 26 Sep 2000 10:35:03 EDT
Subject: Sum: German Forst 'forest'
Message-ID: <TUE.26.SEP.2000.103503.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
Some days ago I posted a query about the disputed etymology of
German <Forst> 'forest'.  I got only three replies, but those
were interesting.

The query was whether German <Forst> derives, like English 'forest',
from a late Latin word, or whether it is a native word derived
ultimately from the German word for 'fir tree'.

Two of the respondents were skeptical of the German etymology.
One of them suggested it might be a residue of the unfortunate
Romantic tendency to seek "Germanic" etymologies for loans from
Latin.  The third, however, was much more enthusiastic about the
Germanic etymology, and noted that the derivation of late Latin
<forestis> from <foris> 'outside' is far from secure, and that
a loan from Germanic has been suggested.  Well, turnabout is fair
play, I guess.

Anyway, it appears that I cannot yet add 'forest' and <Forst>
to my little collection of striking chance resemblances.  But one
of my respondents (SG) sent in a couple of lovely examples of
chance resemblances:

German /Scheune/ "shack" : Coptic /shoine/ id.
German /Schuh/ "shoe" : Itelmen /sxu/ (works even better with Dutch)
aso.

(Itelmen is a Chukcho-Kamchatkan language of eastern Siberia.)

My thanks to David Fertig, Stefan Georg, and Paolo Ramat.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From larryt at cogs.susx.ac.uk  Wed Sep 27 11:35:18 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Wed, 27 Sep 2000 07:35:18 EDT
Subject: Q: Sarich and historical linguistics
Message-ID: <WED.27.SEP.2000.073518.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
In a few weeks, I'm giving a talk on the perception of language
and linguistics among our academic colleagues in other disciplines,
such as psychology, anthropology, archaeology, primatology and
genetics.  Most of this talk will deal with non-historical
matters, but I want also to talk about the seemingly immense
influence of the long-rangers among our colleagues, who often
appear to believe that the long-rangers speak for historical
linguistics.  See, for example, the writings of the geneticist
Robert Sokal, of the palaeoanthropologist Richard Klein, and of
the primatologist Robin Dunbar.

But I've become particularly interested in the writings of the
eminent molecular anthropologist Vincent Sarich, one of the
founders of the out-of-Africa hypothesis of human origins.
Unlike most other non-linguists, Sarich has stepped into
historical linguistics in a big way -- and he doesn't like us
historical linguists very much.  In a 1994 article, he
warmly defends the long-rangers, and he hurls abuse at those
linguists who have criticized their work, accusing the critics
of being anti-scientific and of acting from the basest motives:

Vincent M. Sarich (1994), 'Occam's razor and historical linguistics',
in M. Y. Chen and O. J. L. Tzeng (eds), In Honor of William S.-Y.
Wang, Pyramid Press, pp. 409-430.

But I'm more interested right now in another of Sarich's articles,
published on the Web in 1994 and apparently not published elsewhere.
This article also carries a good deal of abuse directed at the critics
of Greenberg and Ruhlen:

        http://pubpages.unh.edu/~jel/sarich.html

Here is the passage I'm interested in:

"A similar scenario would also appear to apply in the linguistic
realm, but to see it we first need to challenge the extremely
conservative current consensus among most linguists that relationships
among languages that diverged more than perhaps 7,000-8,000 years ago
are, at present, unknowable.  A simple exercise suffices here to show
that this consensus is unreasonably pessimistic.  One simply sits down
with, for example, Buck's A Dictionary of Selected Synonyms in the
Principal Indo-European Languages, a basic word list, and some
independent knowledge of two or more languages representing distinct
Indo-European groups.  I used English and Croatian, representing,
respectively, its Germanic and Slavic branches.  If one then asks what
proportion of the words in modern Croatian appear, simply by inspection
(but allowing for some phonetic and semantic drift), to be cognate with
the reconstructed Proto-Indo-European (PIE) form (or, where that is
unavailable, the English word), one gets a minimum figure of about 60%.
For example, snow, snjeg, *sneigwh; many, mnogo, *monogho; blood, krv,
*kru; tree/wood, drvo, *dru; earth, zemlja, *ghem.  Similar results were
obtained using native speakers of Spanish and Bengali, and for Armenian
and Albanian using Decsy's The Indo-European Protolanguage: a
Computational Reconstruction.  Thus 60% survival seems to be a
reasonably representative figure for the survival of PIE roots with
meanings in extant Indo-European languages.

"Now obviously some number of these matches will be coincidental (though
that number will likely be small, as illustrated by the fact that
Chinese, by the same test, will show less than 10% apparent 'cognacy'
with PIE, English, or Croatian -- I am indebted to Dr W S-Y Wang for
this comparison), but, by the same token, some will be missed when the
degree of phonetic or semantic change makes cognacy less than obvious.
For example -- foot, noga, *ped -- where one might miss the English
correspondence because of the phonetic changes, and would (and, perhaps,
should) certainly miss the Croatian unless one remembered that 'pod' in
Croatian means 'under', and that an association between 'under' and
'foot' is perfectly reasonable.  This would imply a cognacy loss of less
than 10% per millennium along a lineage, implying that even at a time
depth of 12,000-14,000 years; that is, twice the probable time which
separates modern Croatian from its Proto-Indo-European ancestor, one
might retain 30% or so phonetic/semantic cognacy.  Thus one could
recognize relationships among languages whose common ancestor lay that
far in the past provided that one looked at a sufficient number of them,
and avoided simple binary comparisons.  That is, if each of two
descendant languages retains 30% cognacy with the ancestral language,
they will, on average, share only 9% [(0.3)2] with one another -- and
this gets into the chance area of similarity.  On the other hand, if you
look at 10 such languages, three, on the average, will retain a
particular cognate -- greatly increasing your chances of recognizing
relationships among them, and of reconstructing the ancestral form.
This is the procedure and argument of Greenberg [(1987); see also
discussion in Ruhlen (1987)], and, whatever the questions that might be
raised about certain details, there can be no doubt the current general
consensus among most linguists that relationships among languages older
than about 7,000 years are, at present, unknowable, is unrealistically
and unreasonably pessimistic and conservative."  [END QUOTE]

Now, many of these general issues have been much discussed elsewhere,
and I have my own views, which I will reserve for the time being.
But I am interested in hearing comments from colleagues on any part of
this passage, though most particularly on the following points:

        *the use to which Sarich puts Buck's dictionary;

        *the claim that any given living IE language retains about 60%
        of the PIE lexicon in easily recognizable form;

        *the claim that genuine cognates among living IE languages are
        overwhelmingly obvious and trivial to identify by inspection alone;

        *the claim that this result automatically generalizes to other
        families, even to families which are as yet unrecognized.

Please reply directly to me, since I have no wish to flood this list
with discussions of long-ranger work.  I'll post a summary when I can.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From DISTERH at UNIVSCVM.SC.EDU  Fri Sep 29 12:17:40 2000
From: DISTERH at UNIVSCVM.SC.EDU (Dorothy Disterheft)
Date: Fri, 29 Sep 2000 08:17:40 EDT
Subject: Sarich and historical linguistics
Message-ID: <FRI.29.SEP.2000.081740.EDT.DISTERH@UNIVSCVM.SC.EDU>

In a message dated 9/27/2000 6:36:24 AM, larryt at cogs.susx.ac.uk writes:

<<... I want also to talk about the seemingly immense influence of the=20
long-rangers among our colleagues, who often appear to believe that the=20
long-rangers speak for historical linguistics. =20
...I've become particularly interested in the writings of the eminent=20
molecular anthropologist Vincent Sarich, one of the founders of the=20
out-of-Africa hypothesis of human origins. Unlike most other non-linguists,=20
Sarich has stepped into historical linguistics in a big way -- and he doesn'=
t=20
like us historical linguists very much.>>

I hope Larry and everyone else will understand my posting this to the list.=20=
=20
I think it's important to just add a few observations about Sarich that may=20
put his remarks in context.

First of all, it should be remembered that Vincent Sarich has for a long tim=
e=20
taken an advocacy position (and called himself an advocate) regarding certai=
n=20
aspects of human genetics.  He has been for example prominently involved in=20
the dialogue on race and IQ.  And it should also be noted that the article=20
Prof Trask cites (http://pubpages.unh.edu/~jel/sarich.html), entitled "RACE=20
and LANGUAGE in PREHISTORY", is clearly a piece of "advocacy," which=20
obviously treats historical linguistics only as it relates to and serves to=20
advance Sarich's goals with regard to a somewhat larger argument.

Sarich's position on Greenberg and longrangers is pretty much dictated by th=
e=20
Out-of-Africa hypothesis and various other positions Sarich takes regarding=20
genetics and human culture. =20

What clear from the piece is that Sarich is trying to backdate language far=20
enough to make its diversity correlate with current human genetic diversity.=
 =20
Sarich advocates the view that modern human diversity, human intelligence an=
d=20
cultures were born full-blown at some point after the Out-of-Africa event=20
some 100,000 years ago -- with relatively little convergence since.  In the=20
piece, his argument with scientists claiming that language is a recent=20
development is expressly motivated by his position that languages matches up=
=20
with racial genetics.  Sarich is not really a lumper in the strict sense.

And given all the above some caution might be called for in using Sarich as=20
representative of an academic non-linguist's views of historical linguistics=
.=20
 I suspect that if it better served his larger purposes, he would be citing=20
Lehmann and Trask.

This isn't the first time of course that historical linguistics has been=20
called upon to support wider conclusions about human history.  Sarich is=20
fairly unique however in viewing certain elements of it as supporting=20
conclusions that reach back some 30,000 years.

It should be said that there are serious scientists who are not comfortable=20
with Sarich's understanding of the evidence of paleo-culture, much less of=20
his understanding of paleo-language.  (And that's not to say that the geneti=
c=20
implications of Out-of-Africa hasn't been challenged either.)

Some of us think Sarich may be seriously underestimating paleo-humans and ho=
w=20
long it took to develop something as sophisticated as human biology, human=20
culture and human language.  On another web page, for example, one can find=20
an article by the formidable paleobiologist Henry Gee about the Sch=F6ningen=
=20
spears. (http://quartz.ucdavis.edu/~GEL115/spears.html)  To some, the=20
sophistication and possibly accumulative design of these 400,000 year old hu=
n
ting javelins suggests that they could not have been developed or redevelope=
d=20
in a single generation.  And accumulating and transmitting complex knowledge=
=20
from one generation to the next suggests some form of transmission, perhaps=20
some form of language.

Finally, I'd point out also that maybe it is the traditional assumption of=20
strict vertical descent in languages that makes any part of historical=20
linguistics attractive to Vincent Sarich and his "anti-convergenist"=20
monogenetic polemics.  Those of us who think that there may be a relatively=20
high degree of convergence in linguistic history don't find commonalities=20
between languages extremely precise in illuminating prehistory or necessaril=
y=20
indicative of some common noble biological ancestor.  After all, the most=20
basic function of language is communication and that should move us all to=20
try to speak the same language, not different ones.

And, of course, it's refreshing for us "convergenists" to see that the=20
primacy of vertical descent has recently taken a good drubbing in biology. =20
(See, e.g., Stephen Jay Gould's "Linnaeus's Luck" in Natural History,=20
September 2000).  And some of us expect the same to eventually happen on a=20
different level to "Out-of-Africa".

In the meantime, it might be suggested that Vincent M. Sarich's views are no=
t=20
at all the best reflection of how informed non-linguists understand=20
historical linguistics.

Steve Long


From ratcliff at fs.tufs.ac.jp  Fri Sep  1 13:16:51 2000
From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe)
Date: Fri, 1 Sep 2000 09:16:51 EDT
Subject: Q: the 'only six' argument
Message-ID: <FRI.1.SEP.2000.091651.EDT.>


> ----------------------------Original
> message----------------------------
> Larry Trask wrote:
>
> > So, my question: does anybody believe that any version of this
> > statement is valid?  More precisely, do we have a number N and
> > a set of criteria C such that the existence between two languages
> > of N matches satisfying criteria C is enough to guarantee that
> > the languages must be related?

Wasn't this what Donald Ringe was trying to do? (The Factor of  Chance
in Language Comparison, Philadelphia 1992).

But the statement phrased as you have it is certainly not valid.  First
no number of "matches" (sound correspondences?) can *guarantee* that the
languages are related, only that the probability of their being related
is high. Second "related" has to be understood as historically related
rather than genetically related, because numerical criteria only help to
decide the issue chance vs. non-chance similarity, not which type of
historical contingency (descent from a common source or subsequent
contact) may have produced the non-chance pattern. Third there is no
absolute number valid in all cases, because it depends on the size and
nature of the sample being compared. Specifically in the case of sound
correspondences, the bigger the dictionary or word list the more chance
correspondences can be expected; and the smaller the segment inventories
of the languages compared the more chance correspondences can be
expected. This is because the average expected number of chance
occurences of an event (in this case a correspondence at a given
position in a word) is the probablility of the event (in this case the
relative frequency in the given position of the segments compared
multiplied by each other) times the number of trials (in this case the
number of semantically equivalent words available for comparison).

So if you have two languages A and B, both of which have only ten
consonants evenly distributed in word first position, and you have an
A-B dictionary which has 10,000 entries correlating one word in A with
one and only one semantic equivalent in B, with no synonyms in either
langauge, you'd expect to find about 100 matches between any first
consonant in A and any first consonant in B (chance that x will occur as
first consonant in A: 1/10, multiplied by chance that y will occur as
first consonant in B: 1/10, multiplied by  total places where 1st C of a
word in A can be compared with 1st C of word in B: 10,000). So you
wouldn't be justified in suspecting a historical relationship till you
got a good bit over a 100 matches.  On the other hand if you had two
languages with 25 consonants evenly distributed and a lexicon based on
a1000 word random sample, you'd expect an average of only 1.6 first
consonant matches (1/25 * 1/25 * 1000). So you'd be justified to suspect
a non-accidental, hence historical relationship even with as few as 4 or
5 matches.


 Bobby D. Bryant wrote:

>
>
> In short, I don't think such a formalization of the problem in terms
> of N and
> C is going to work in practice.  At some level you are always going to
> have
> to pile on enough examples to convince your peers, which is of course
> the way
> things have always worked.
>

Piling on enough examples to convince your peers no longer works in
practice, or else the long-distance comparison debates would not have
become as acrimonious as they have. Formalizing the problem seems to me
to be the only way forward. Besides, isn't that where the joy of
research lies-- in ever sharpening and refining our understanding of our
subject matter and of the tools we use to analyze it?


-- -----------------------------------------------------------
Robert R. Ratcliffe
Dept. of Linguistics and Information Science
Tokyo University of Foreign Studies
Asahi-machi 3-11-1,
Fuchu-shi, Tokyo
183-8534 Japan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000901/0482ac27/attachment.htm>

From jer at cphling.dk  Sun Sep  3 14:19:19 2000
From: jer at cphling.dk (Jens Elmegaard Rasmussen)
Date: Sun, 3 Sep 2000 10:19:19 EDT
Subject: Q: the 'only six' argument
In-Reply-To: <39AE4FFF.1750AFC5@mail.utexas.edu>
Message-ID: <SUN.3.SEP.2000.101919.EDT.JER@CPHLING.DK>

----------------------------Original message----------------------------
Dear List,

I have occasionally shocked my students by insisting that ONE *probative*
example is enough to prove the point for which it is probative. The
statement, of course, is tautological: If the examples did NOT prove its
point, it would not be probative, for that's what the word probative
means.
   The consequence is that, e.g., in Indo-European, certain disputed
groupings MUST be accepted unless we are willing to swallow very awkward
camels: If the Celtic superlative in *-isamo- and the Italic one in
*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo-
[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other
adjectives), and one cannot be assumed to have been borrowed from the
other (would you borrow a new form of the superlative, if your language
has a perfectly good one already?), then there WAS an Italo-Celtic node in
the splitting-up of the IE unity.  Similar arguments could be set up for
some of the points uniting Baltic and Slavic which look strong enough in
themselves to carry the burden of proof even if they were not supported by
others.
   Nice to see the list blossoming again.

   Jens E. Rasmussen


From hwhatting at hotmail.com  Mon Sep  4 12:46:49 2000
From: hwhatting at hotmail.com (Hans-Werner Hatting)
Date: Mon, 4 Sep 2000 08:46:49 EDT
Subject: Q: the 'only six' argument
Message-ID: <MON.4.SEP.2000.084649.EDT.HWHATTING@HOTMAIL.COM>

----------------------------Original message----------------------------
On Sun, 3 Sep 2000 10:19:19 EDT, J. E. Rasmussen wrote:
>I have occasionally shocked my students by insisting that ONE *probative*
>example is enough to prove the point for which it is probative. The
>statement, of course, is tautological: If the examples did NOT prove its
>point, it would not be probative, for that's what the word probative
>means.
>    The consequence is that, e.g., in Indo-European, certain disputed
>groupings MUST be accepted unless we are willing to swallow very awkward
>camels: If the Celtic superlative in *-isamo- and the Italic one in
>*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo-
>[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other
>adjectives), and one cannot be assumed to have been borrowed from the
>other (would you borrow a new form of the superlative, if your language
>has a perfectly good one already?), then there WAS an Italo-Celtic node in
>the splitting-up of the IE unity.  Similar arguments could be set up for
>some of the points uniting Baltic and Slavic which look strong enough in
>themselves to carry the burden of proof even if they were not supported by
>others.

Languages don4t only borrow words or formations because they don4t have an
adequate expression for a concept. Simply imitating a formation seen as more
expressive or the usage of a language which is seen as more prestigious also
plays a role. A good example from modern German is the borrowing of the
English way of expressing the year in which an event happened. The
traditional way in German is to say "Es geschah 1999.", but now quite often
one can find "Es geschah in 1999.", which is a clear calque on English. The
reason behind this is, of course, the big prestige of the English language,
its far-spread knowledge, and also that this formation is more expressive
than the traditional German one. A superlative formation seems to be a good
candidate for borrowing on grounds of expressiveness.
I don4t want to say that the superlative formation quoted cannot serve as
proof for Italo-Celtic unity. But if there is only one example (in this
case, of course, there are more than one, but the evidence is still
inconclusive), one can never exclude borrowing. The only thing it proves is
that the speakers of Proto-Celtic and Proto-Italic have been living close
enough to borrow from one another.
I would like to add the following to the general discussion:
1.) No quantity of matches can ever prove genetic relationship. One can
probably find thousands of matches between, e.g., French and English or
Latin and Albanian, without Albanian or English being Romance languages.
2.) There is, as far as I knoe, some sort of communis opinio on that certain
matches (from basic vocabulary, grammatical morphemes) are more important
for proving genetical relationship than others.
3.) I would recommend that if one has collected one4s matches, one should
try a reconstruction. If the results are a decent basic vocabulary, and a
basic common grammar, the languages examined are most probably genetically
interrelated. There4s of course the question how to define "decent basic
vocabulary" and "basic common grammar", and that4s (besides the
questionableness of many matches) the main problem for wide-range
reconstructions like Nostratic, Proto-World etc. Anyone interested in
formulating some minimalist criteria?
4.) Always look at the history behind the matches. Are their historical
links between the carriers of the respective languages, and of which kind
are they? This is of course impossible if the history is not known, and if
one wants to use language to reconstruct history.
--
Essentially, I think a numerical approach does not take us very far. The
most important question seems to me, can we reconstruct a system based on
the matches, and what does it look like? If we get a basic grammar and basic
vocabulary, there are strong reasons to suspect genetical relationship; if
we get (say) a group of religious words, we can assume borrowing based on
religious influences, and so on. Here, of course, numbers play a role - one
simply needs a sufficient number of matches to constitute a system. But if
we have to small a number of matches to form a convincing system, only
historical evidence can help.

Best regards,
Hans-Werner Hatting, mag. phil.


_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at
http://profiles.msn.com.


From r.rankin at latrobe.edu.au  Mon Sep  4 12:40:49 2000
From: r.rankin at latrobe.edu.au (R. Rankin)
Date: Mon, 4 Sep 2000 08:40:49 EDT
Subject: Q: the 'only six' argument
Message-ID: <MON.4.SEP.2000.084049.EDT.R.RANKIN@LATROBE.EDU.AU>

----------------------------Original message----------------------------
Larry Trask wrote:

> Quite often, in my reading, I've come across a statement of the
> following type:
>         "The presence of only six good matches between two languages
>         is enough to show that the languages must be genetically related."
> ... the number is always different.  Six is the smallest I've ever seen, but
> I've also seen 15, 50 and various other numbers.

I don't recall often seeing such claims, but it is probable that I just
disregarded them and read on.  Personally, I'm very skeptical about the
possibilty of developing any airtight criteria for genetic relationship that
will work cross-linguistically.  This sort of thing has to be done on a
case-by-case basis.  Factors may include various structural considerations
(phonological, morphological and lexical), likelihood of creolization,
likelihood of participation in a Sprachbund, etc.

Meillet is said to have remarked that one could tell if a language were
Indo-European or not just by examining the conjugation of the verb 'be'
(though at my present location I cannot give you a citation).  I tend to agree
with his insistence on morphological criteria, but still think there are far
too many potential variables for us to permit ourselves to be dogmatic.
Noodling around with "universal criteria" is an enterprise for synchronists;
we should not let ourselves be seduced into trying it in genetic linguistics.

Bob Rankin

--
Robert L. Rankin, Visiting Fellow
Research Center for Linguistic Typology
Institute for Advanced Study
La Trobe University
Bundoora, VIC 3083 Australia

Office: (+61 03) 9467-8087
Home:   (+61 03) 9499-2393


From degraff at MIT.EDU  Tue Sep  5 09:57:18 2000
From: degraff at MIT.EDU (Michel DeGraff)
Date: Tue, 5 Sep 2000 05:57:18 EDT
Subject: Q: the 'only six' argument
In-Reply-To: Your message of "Mon, 04 Sep 2000 08:46:49 EDT."
 <F159cIfLuEJmMcMHvPd00002a0d@hotmail.com>
Message-ID: <TUE.5.SEP.2000.055718.EDT.DEGRAFF@MIT.EDU>

----------------------------Original message----------------------------

Holding humbly and tightly on my creolist-cum-syntactician hat, I would
like to inquisitively and constructively piggy-back on Hans-Werner
Hatting's observations and questions regarding (alleged) criteria for
genetic relatedness.

> 1.) No quantity of matches can ever prove genetic relationship. One can
> probably find thousands of matches between, e.g., French and English or
> Latin and Albanian, without Albanian or English being Romance languages.

In a similar vein, note that the etymology of Haitian Creole---a (so
called) "non-genetic" language---is overwhelmingly French while the lexicon
of Modern English---a (so called) "genetic" language---is mostly
non-Germanic etymologically.  Besides, virtually all Haitian Creole affixes
have cognates in French affixes whereas English has many affixes of
non-Germanic origins.

By the way, the latter observation about Haitian Creole suffices to
falsify all these `classic' Creole-genesis scenarios that posit a
affixless-pidgin phase a la Jespersen, Bickerton, McWhorter, Seuren, etc.

> 2.) There is, as far as I knoe, some sort of communis opinio on that certain
> matches (from basic vocabulary, grammatical morphemes) are more important
> for proving genetical relationship than others.

Virtually all of Haitian Creole's grammatical morphemes are etymologically
French.

> 3.) I would recommend that if one has collected one's matches, one should
> try a reconstruction. If the results are a decent basic vocabulary, and a
> basic common grammar, the languages examined are most probably genetically
> interrelated. There's of course the question how to define "decent basic
> vocabulary" and "basic common grammar", and that's (besides the
> questionableness of many matches) the main problem for wide-range
> reconstructions like Nostratic, Proto-World etc. Anyone interested in
> formulating some minimalist criteria?

Given what I've noted above vis-a-vis lexicon and morphology, it then seems
that *absence* of "basic common grammar" would be *the* structural
criterion for claiming that Creole languages such as Haitian Creole are
"non-genetic" languages that arose via "abnormal transmission" whereas
French, say, is a "genetic" language that arose via "normal transmission".

Let me try and be more precise as to what I think are the implications of
an hypothetic "basic common grammar" with respect to the
genetic-vs-non-genetic hypothesis as it applies to, say, Haitian Creole
vs. French. Whatever features define this "basic common grammar", these
features must diverge when comparing the grammars of (colloquial) 17th-18th
century French dialects to that of Haitian Creole, and such divergences
must be *qualitatively* different than their counterparts in the
("genetic") course of French diachrony.

So far, I have not be able to isolate such features. Whatever divergences
exist between colloquial 17th-18th century French dialects and Haitian
Creole (e.g., `loss' of verbal inflection, verb-placement differences, etc.)
seem to have counterparts in the diachronic course of `genetic' languages.
And what I find most intriguing is that such divergences in `genetic'
diachrony also seem to coincide with the history of contact within these
`genetic' diachronies.  This was, of course, noted by Meillet, although he
would most likely not agree with the conclusions I seem drawn to.

In any case, if the "basic common grammar" remains elusive, then perhaps
it's time to seriously (re-)challenge the alleged (non-)genetic dichotomy
between Creole and non-Creole languages and/or the very concept of "genetic
relatedness" as a linguistically (i.e., *structurally*) definable concept.

Then again, I still need to learn more about the structural basis of
genetic linguistics.  This, I look forward to.

                                 -michel.
___________________________________________________________________________
MIT Linguistics & Philosophy, 77 Massachusetts Ave, Cambridge MA 02139-4307
degraff at MIT.EDU        http://web.mit.edu/linguistics/www/degraff.home.html
___________________________________________________________________________


From larryt at cogs.susx.ac.uk  Tue Sep  5 09:59:09 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Tue, 5 Sep 2000 05:59:09 EDT
Subject: Sum: the 'only six' argument
Message-ID: <TUE.5.SEP.2000.055909.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
I was planning to post a summary of the responses to my query
last week about the 'only six' argument.  However, after the
first few respondents replied to me privately, the responses
shifted to the list, and so all of you will now have seen
most of the responses already.

I will therefore content myself with reporting that no one
who has so far replied has expressed any great sympathy with
any version of the 'only six' argument, and several people
have been openly hostile.

These negative responses don't surprise me at all.  I am
certainly not sympathetic to the 'only six' argument.  It's
just that I keep coming across claims of this sort every now
and again, and I was beginning to wonder if a significant
number of historical linguists were embracing such arguments.
Apparently not.

Anyway, I hope we may continue the discussion on the list,
so long as Dorothy is willing.  My mail spool has been rather
short of interesting historical discussions since the IE list
suddenly collapsed last April.

My thanks to everyone who has replied.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From larryt at cogs.susx.ac.uk  Fri Sep  8 12:18:58 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Fri, 8 Sep 2000 08:18:58 EDT
Subject: Q: German Forst 'forest'
Message-ID: <FRI.8.SEP.2000.081858.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
This is an etymological question.

English 'forest' is, of course, borrowed from Old French,
where it goes back to Late Latin <forestis (silva)> 'outer forest',
with the first element possibly from <foris> 'outside'.

I had always assumed that German <Forst> 'forest' had the same
origin.  But, on checking, I find that things are more complicated.

Some sources agree that the German word is of the same origin
as the English one.  But other authorities, including Kluge,
give a quite different etymology.  They derive <Forst> from an
unrecorded *<forhist>, a derivative of Old High German <foraha>
'fir tree' (modern <F?hre>), with a semantic shift 'fir forest' >
'conifer forest' > 'forest'.  Davis, in his English edition of Kluge,
observes that opinion is divided on this etymology.

Just to complicate things, Middle High German had a word <forest>
'forest', which even the proponents of Kluge's etymology seem to
agree is derived from Latin and unrelated to modern <Forst>.

So, my question is this.  Is there now general agreement on the
etymology of <Forst>?  Or is the question still up in the air?

I ask because, if the Germanic etymology of <Forst> is confirmed,
then 'forest' and <Forst> constitute one of the most wonderful
chance resemblances I have ever seen -- right up there with
English 'much' and Spanish <mucho> 'much', and English 'bad' and
Persian <bad> 'bad'.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From paoram at unipv.it  Sat Sep  9 19:04:21 2000
From: paoram at unipv.it (Paolo Ramat)
Date: Sat, 9 Sep 2000 15:04:21 EDT
Subject: R:      Q: German Forst 'forest'
Message-ID: <SAT.9.SEP.2000.150421.EDT.PAORAM@UNIPV.IT>

----------------------------Original message----------------------------

-----Messaggio originale-----
Da: Larry Trask +ADw-larryt+AEA-cogs.susx.ac.uk+AD4-
A: HISTLING+AEA-VM.SC.EDU +ADw-HISTLING+AEA-VM.SC.EDU+AD4-
Data: sabato 9 settembre 2000 1.55
Oggetto: Q: German Forst 'forest'


+AD4-----------------------------Original message----------------------------
+AD4-This is an etymological question.
+AD4-
+AD4-English 'forest' is, of course, borrowed from Old French,
+AD4-where it goes back to Late Latin +ADw-forestis (silva)+AD4- 'outer forest',
+AD4-with the first element possibly from +ADw-foris+AD4- 'outside'.
+AD4-
+AD4-I had always assumed that German +ADw-Forst+AD4- 'forest' had the same
+AD4-origin.  But, on checking, I find that things are more complicated.
+AD4-
+AD4-Some sources agree that the German word is of the same origin
+AD4-as the English one.  But other authorities, including Kluge,
+AD4-give a quite different etymology.  They derive +ADw-Forst+AD4- from an
+AD4-unrecorded +ACoAPA-forhist+AD4-, a derivative of Old High German
 +ADw-foraha+AD4-
+AD4-'fir tree' (modern +ADw-F+APY-hre+AD4-), with a semantic shift 'fir forest'
 +AD4-
+AD4-'conifer forest' +AD4- 'forest'.  Davis, in his English edition of Kluge,
+AD4-observes that opinion is divided on this etymology.
+AD4-
+AD4-Just to complicate things, Middle High German had a word +ADw-forest+AD4-
+AD4-'forest', which even the proponents of Kluge's etymology seem to
+AD4-agree is derived from Latin and unrelated to modern +ADw-Forst+AD4-.
+AD4-
+AD4-So, my question is this.  Is there now general agreement on the
+AD4-etymology of +ADw-Forst+AD4-?  Or is the question still up in the air?
+AD4-
+AD4-I ask because, if the Germanic etymology of +ADw-Forst+AD4- is confirmed,
+AD4-then 'forest' and +ADw-Forst+AD4- constitute one of the most wonderful
+AD4-chance resemblances I have ever seen -- right up there with
+AD4-English 'much' and Spanish +ADw-mucho+AD4- 'much', and English 'bad' and
+AD4-Persian +ADw-bad+AD4- 'bad'.
+AD4-
+AD4-
+AD4-Larry Trask
+AD4-COGS
+AD4-University of Sussex
+AD4-Brighton BN1 9QH
+AD4-UK
+AD4-
+AD4-larryt+AEA-cogs.susx.ac.uk
+AD4-
+AD4-Tel: 01273-678693 (from UK)+ADs- 1273-678693 (from abroad)
+AD4-Fax: 01273-671320 (from UK)+ADs- 1273-671320 (from abroad)

+ACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgA
 qACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKg
 AqACoAKgAqACoAKg-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
 +-+-+-
+AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA
 9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQ-

Dear Larry,
the etymology of Germ. +AF8-Forst+AF8- proposed also in the 23rd. ed. of Kluge's
Etym. Wtb. der dt. Spr.
(by E. Seebold, 1995) sounds rather unconvincing. From a Gmc.
 +ACoAXw-forhist+AF8-
+ACI-Gehegtes+ACI- we should have  MHG +AF8-foerhest+AF8- (with Umlaut) and NHG
 +ACoAXw-foerst+ACI-,
just as we get +AF8-lengest+AF8-(+ADwAKgBf-langisto+AF8-),
 +AF8-ermest+AF8-(+ADwAKgBf-armisto+AF8-) etc., and NHG
+AF8-laengst+AF8-, +AF8-aermst+AF8-
Moreover, +AF8-Forst+AF8- seems not to be ProtoGmc.: it is attested in Germ. and
Dutch (+AF8-vorst+AF8-) only . Thus I think you are right: the chance that we
 have
here a loanword from Latin seems more plausible than the other hypothesis.
Also De Vries, Nederl. etymol. Woordenb., says that OHG +AF8-forst+AF8- ( ca.
 800)
may derive from MLat. +AF8-forestis+AF8- , +ACI-reeds in 648 in een oorkonde
 voor
Stavelot-Malm+AOk-dy+ACI-.

Best,
Paolo


From ratcliff at fs.tufs.ac.jp  Sat Sep  9 19:05:04 2000
From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe)
Date: Sat, 9 Sep 2000 15:05:04 EDT
Subject: Sum: the 'only six' argument
Message-ID: <SAT.9.SEP.2000.150504.EDT.>

 Larry Trask wrote:

> I will therefore content myself with reporting that no one
> who has so far replied has expressed any great sympathy with
> any version of the 'only six' argument, and several people
> have been openly hostile.
>
> These negative responses don't surprise me at all.  I am
> certainly not sympathetic to the 'only six' argument.  It's
> just that I keep coming across claims of this sort every now
> and again, and I was beginning to wonder if a significant
> number of historical linguists were embracing such arguments.
> Apparently not.

Wait just a second there. I may have sounded negative myself. But when I
thought about it a little more, I realized that there is a legitimate
and  interesting argument there, and it ought to be in historical
linguistics textbooks if it isn't. ( I don't know if this is the
argument you have seen, but I'd be interested to know if it IS any
textbooks?).

Basically, IF one has set up the question properly and IF one has
carried out the comparison with discipline and honesty (big ifs, of
course), then a very small number of examples of a single sound
correspondence is sufficient to demonstrate a historical (not
necessarily a genetic) relationship beyond any reasonable doubt.
Practically speaking, given the sample sizes we usually work with and
the way that phonological systems are set up, in most cases, the
necessary number is indeed around six or not much more. This isn't
anything for anyone to be hostile to (or sympathetic to, for that
matter); it simply follows necessarily from the logic of probability.
I'll explain, but first a clarification.

When you ask about numerical criteria for a genetic relationship, you
are asking (at least) two separate questions. Most of the respondents
addressed the second question-- what are the criteria for determing if
two historically related languages are related genetically-- as opposed
to being related by contact or borrowing, or by being in a
lexifier-creole relationship. Some respondents addressed the question of
what criteria are relevant for subclassifying genetically related
languages. As far as I can see (and as most of the respondents said),
numerical criteria simply are not relevant for making these kinds of
judgements. It's the nature of the similarities or commonalities, not
the number of them that count. In any case probability theory doesn't
come into play because in all these cases we have already ruled out
coincidence as an explanation.

But when approaching unclassified languages or languages which haven't
been compared to each other before, the first question we have to ask is
whether these languages have something in common which cannot be due to
chance or coincidence. Numerical criteria and probability theory are the
most reliable means for making judgements of this type.

 Here's how you end up with only six: First the average expected number
of chance matches between any two consonants in any two languages (that
is the expected number of times the consonants will appear in the same
position in a word with the same meaning) is the frequency of the first
consonant in its language times the frequency of the second consonant in
its language times the number of word pairs available for comparison.
Thus if ten percent of the words start with /t/ in one language and ten
percent of the words in the other language start with /b/ then in a
hundred word sample, there should be (by chance) one case where the
translation of a word starting with /t/ in the first language starts
with  /b/ in the second.  In a 1000 word sample there should be about
ten such cases. One rough guide to frequency of a consonant is simply 1
over the number of consonants in the inventory. So if you have twenty
consonants the average frequency of each consonant is 1/20 or .05. If
you have a Macintosh with a graph calculator try entering this formula
1/x^2*n100 (one over x squared times n times 100). This gives you the
expected number of correspondences, in a sample with n100 word pairs, of
two languages both with x number of consonants, evenly distributed. You
can see from this that as long as the average size of the consonant
inventory is greater than 10 (or put another way, where no consonant
occupies more than ten percent of the word positions being compared) the
expected number of chance matches in a 100 wd sample is between 1 and 0.
That is in a 100 word sample you expect that each consonant (in initial
position) in one language will match up with each consonant in the other
in one word or not at all. In a 1000 word sample the expected chance
avgs. are not all that much higher-- basically if the average size of
the consonant inventories is 14 (or the avg. frequency no more thant
1/14), you only expect to get 5 chance correspondences, though below 14
the expected number starts to climb dramatically. (At 5 the expected
number is 40).

 The next question is how far above the average do we have to get before
coincidence becomes an absurdly unlikely explanation. There is a formula
for this, but I won't go through it since this post has gotten long. But
here is one example: In the case where two langauges both have 20
consonants evenly distributed (or more realistically in comparing two
consonants in two languages both of which have a frequency of 5% in the
word-position being compared in their respective languages), the
probabilty of finding more than 5 correspondences (i.e. 6 or more) in a
100 wd. sample is 0.000000356, or roughly 1 in 2.8 million. (The chance
of finding 5 or more is roughly 1 in 163,000.) So in this set of
circumstances "6 or more" (i.e a single correspondence set occuring in a
given position-- say word-initially-- in 6 or more words) should be
pretty well conclusive for demonstrating a non-chance and hence almost
certainly historical (genetic or contact) relationship.

 I think that working all this out mathematically is interesting and
important for compartive linguistics for two reasons. First it means
that if you apply the comparison strictly (allow only one-to-one word
comparisons, and one-to-one phoneme comparisons) you can get more
knowledge from less information-- you can potentially demonstrate a
relationship with much less data than comparativists have traditionally
thought necessary. This is important to me, because I work in
Afroasiatic, where the perpetual concern is exactly how to get more
knowledge with less information (few old texts for most langauges).

    But the other side of this is that the mathematics makes it
perfectly clear that if you relax the semantic and phonemic criteria far
enough, you quickly come to a point where the expected number of chance
correspondences becomes so high, that it becomes practically impossible
to mount an effective demonstration of a relationship. The relevant
parameters are number of comparisons and frequency of consonants. If you
allow for comparison of each word with a wide range of semantically
close words you multiply the number of comparisons and effectively
increase the sample size. (A pair of 1000 wd-lists with one-to-one
matching is the same mathematically as two 100 wd. lists with each word
compared with 10 words in the other language-- both give 1000 pairs or
trials).  Going back to the previous example with frequency of 5% for
each consonant the number of matches you need to get to the 1 in a
million or better range for different samples sizes are: 200-8, 500-10,
1000-14, 2000-19. In other words although the average number of expected
chance correspondences increases geometrically with sample size, the
number needed for reasonable certainty of non-chance goes up at a higher
rate. If you are considering each word in a 1000 wd list against 20 or
30 semantically close words, the effective sample size-- and hence the
number of matches needed to demonstrate a non-chance relationship--
becomes gigantic. (I don't have a calculator powerful enough to
calculate it though, sorry.) Similarly If you allow many-to-many phoneme
matchings, you effectively increase the frequency. If you compare two
systems of 15 consonants at 3 points of articulation one-to-one the
chance of a match is on average 1/15 squared. The expected number of
chance matches in a 1000 word sample is between 4 and 5 (4.44)--
reasonable. The chance of matching any two consonants at the same point
of articulation  is 1/3 squared. In a 1000 wd. sample the expected
number of chance matches is 111-- a big jump.

Thus with very loose criteria, the comparatist is in the paradoxical
position of having to prove the existence of hundreds of "bad" (random)
correspondences in order to have any confidence of having found in any
good ones (ones which actually reflect language history). And if there
really are any good correspondences, the problem of how to pick them out
from all the random "noise" which is certain to be there is daunting.


>

  -- -----------------------------------------------------------
Robert R. Ratcliffe
Associate Professor, Arabic and Linguistics,
Dept. of Linguistics and Information Science
Tokyo University of Foreign Studies
Asahi-machi 3-11-1,
Fuchu-shi, Tokyo
183-8534 Japan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000909/dfa3946d/attachment.htm>

From kroch at change.ling.upenn.edu  Thu Sep 14 01:33:29 2000
From: kroch at change.ling.upenn.edu (Tony Kroch)
Date: Wed, 13 Sep 2000 21:33:29 EDT
Subject: Announcing the second edition of the Penn-Helsinki Parsed Corpus
 of Middle English
Message-ID: <WED.13.SEP.2000.213329.EDT.KROCH@CHANGE.LING.UPENN.EDU>

----------------------------Original message----------------------------
The second edition of the Penn-Helsinki Parsed Corpus of Middle English
(PPCME2) is now publicly available under the conditions outlined
below. It consists of 55 text samples containing 1.3 million words of
syntactically annotated Middle English prose and ranging over four
time periods, from 1150 to 1500.

Like the first edition of the PPCME, the PPCME2 is based on the
Middle English portion of the Helsinki Corpus of English Texts that
was created at the University of Helsinki under the direction of
Matti Rissanen and Ossi Ihalainen. The size of the text samples in
the second edition has been enlarged so that the total corpus size is
nearly three times larger. In addition, the corpus is now tagged for
part of speech and the syntactic annotation system is richer.
For the earliest time period, all texts except one are complete; the
exception is the Ancrene Riwle sample, which contains approximately
50,000 words. For the later time periods, two texts per time period
were expanded
to approximately 50,000 words.  The remaining texts are represented
by the Helsinki Corpus sample.

The PPCME2 is being distributed on a CD-ROM that includes several files
for each text in the corpus:

        - a file with unannotated text
        - a file with philological and other information about the text
          (manuscript and edition used, date, dialect, genre, and word count
          of the sample)
        - a file in which individual words are tagged for part of speech
        - a file that is annotated for syntactic structure

Available with the corpus is CorpusSearch, a Java program written by
Beth Randall that runs under Unix, Linux, MacOS and Windows.
CorpusSearch uses standard syntactic predicates like ``(immediately)
precedes'', ``(immediately) dominates'', and Boolean combinations
thereof, and it allows outputs of previous search as inputs to
further searches.

To order the PPCME2, please go to http://www.ling.upenn.edu/mideng and
follow the instructions there.

The cost of a subscription to the corpus is $200 and the cost of a
license for CorpusSearch is $50.  The items may be purchased together
or separately.  Proceeds from the sale of the corpus will pay for
improving the corpus and for increasing its size over time. Proceeds
from the sale of CorpusSearch will go to the author.

The PPCME2 was designed and built by Anthony Kroch and Ann Taylor at the
University of Pennsylvania.  Supplementary assistance was provided by
Beatrice Santorini.  The PPCME2 is part of of a larger project to produce a
parsed diachronic corpus of English from 800 to 1800.  The Old English part
is under construction at York under the direction of Anthony Warner, Susan
Pintzuk, and Ann Taylor and the Early Modern English part is under
construction at the University of Pennsylvania under the direction of Kroch
and Santorini.


From Ann.Kumar at anu.edu.au  Thu Sep 14 11:01:01 2000
From: Ann.Kumar at anu.edu.au (Ann Kumar)
Date: Thu, 14 Sep 2000 07:01:01 EDT
Subject: "only six" argument
Message-ID: <THU.14.SEP.2000.070101.EDT.ANN.KUMAR@ANU.EDU.AU>

We have been following the HISTLING discussion initiated by Larry Trask
with interest, because we have been involved over the last two years in a
particular case that had to solve the problem of the amount of data that is
necessary to establish relatedness. (Not genetic, but via borrowing).  We
have been doing what Robert R. Ratcliffe takes as his starting point in his
last e-mail, i.e. "approaching unclassified languages or languages which
haven't been compared before [where] the first question we have to ask is
whether these languages have something in common which cannot be due to
chance or coincidence."  The results will be published in the December
issue of Oceanic Linguistics, but we thought it might interest LIST members
to have a sneak preview, at least of the (rahter long) section on
probability, where we discuss relevant issues.  (The section is attached.)


We were trying to find out whether some semantic and phonological matches
in Old Japanese and Old Javanese lexis were too extensive to be due to
chance. In this particular case, rather than looking at single sound
correspondences, we used whole-word comparison, and of longer words (CVCVC
structure) with recurrent sound correspondences.  While it is not possible
to go into the calculations here, it turned out that in this case only one
match between words of this length could be expected to occur by chance. In
the section on probability Rose discusses the usefulness of the approaches
taken earlier by Nichols and Ringe and goes on to propose that a Bayesian,
rather than frequentist, statistical approach should be the preferred
option. We have attached this section.

We agree with Ratcliffe that "Numerical criteria and probability theory are
the most reliable means for making judgements of this type". But we are
able to demonstrate a few more things that might interest LIST readers,
(and can also offer some real data!).  As mentioned, we also have some
points to make concerning the appropriateness of the frequentist (as
opposed to a Bayesian) paradigm for evaluating questions of this kind (i.e
assessing the probability of a hypothesis). (Bayesian formulations are
used, for example, in forensics. We don't know to what extent historical
linguistics are aware of them, so we offer them in case people are
interested.)

Ann Kumar
Phil Rose


-------------- next part --------------
A non-text attachment was scrubbed...
Name: short_prob.doc
Type: application/mac-binhex40
Size: 71221 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/histling/attachments/20000914/da56d2a7/attachment.hqx>
-------------- next part --------------
 ===========================================================================
Dr Ann Kumar
Vice-President, Australian Academy of the Humanities
Centre for the Study of Asian Societies and Histories
Faculty of Asian Studies
Canberra ACT 0200
Australia
Tel. (02) 6249 3677/4658  fax. (02) 6279-8326


From X99Lynx at aol.com  Mon Sep 25 14:55:08 2000
From: X99Lynx at aol.com (Steve Long)
Date: Mon, 25 Sep 2000 10:55:08 EDT
Subject: Superlative Forms and Swallowing Camels
Message-ID: <MON.25.SEP.2000.105508.EDT.>

----------------------------Original message----------------------------
On Sun, 3 Sep 2000 10:19:19 EDT. jer at cphling.dk wrote:

<<The consequence is that, e.g., in Indo-European, certain disputed groupings
MUST be accepted unless we are willing to swallow very awkward camels:...
[e.g.,] the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo-
... cannot be assumed to have been borrowed from the other (would you borrow
a new form of the superlative, if your language has a perfectly good one
already?)....>>

(Hi, Jens!)
I must ask of course how we know that one language or the other already had
"a perfectly good form of the superlative?"

With all due respect to the writer, from whom I've already learned a great
deal, I must ask whether the case is as clear cut as he perceives it.  This
camel might be the kind you find in animal cracker boxes -- bite-sized.

Ironically, two relevant languages make no morphological distinction between
the comparative and the superlative - Manx and French.  If this says nothing
else, it proves that languages can find themselves without any form of the
superlative, much less "a perfectly good one."

Whatever forces caused the loss of the superlative in those languages may
have caused an earlier loss in either Celtic or Italic.  And that would have
meant one or the other of those two languages may have been in need of a
superlative form and therefore had a very good reason to borrow it.

And doesn't the question <<would you borrow a new form of the superlative, if
your language has a perfectly good one already?>> work both ways?  Why would
"Italo-Celtic" innovate a superlative form when they already had a perfectly
good one?

In my mind this raises again the question of how one distinguishes between a
borrowing and descent from a common ancestor, IF the word or form is actually
old
enough to predate indicia of borrowing.

Also, there are those of us who suspect that going back 4000+ years creates a
great deal of uncertainty about what languages -- both IE and non IE -- the
form could have been borrowed from.

The reconstruction the author offers -- "the Celtic superlative in *-isamo-
and the Italic one in *-is(s)amo- cannot be imagined to be parallel
developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto
with other adjectives)" -- does not foreclose the possibility that
development is one that occurred in some third language (or the specialized
dialect of an influential, itinerant linguistic community -- like scribes or
priests) and that both Latin and Celtic "borrowed" it independently.

And finally, why would a language borrow a word like "superlative" when
presumably back in the days of Old English, it "already had a perfectly good
one?"

Regards,
Steve Long


From larryt at cogs.susx.ac.uk  Tue Sep 26 14:35:03 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Tue, 26 Sep 2000 10:35:03 EDT
Subject: Sum: German Forst 'forest'
Message-ID: <TUE.26.SEP.2000.103503.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
Some days ago I posted a query about the disputed etymology of
German <Forst> 'forest'.  I got only three replies, but those
were interesting.

The query was whether German <Forst> derives, like English 'forest',
from a late Latin word, or whether it is a native word derived
ultimately from the German word for 'fir tree'.

Two of the respondents were skeptical of the German etymology.
One of them suggested it might be a residue of the unfortunate
Romantic tendency to seek "Germanic" etymologies for loans from
Latin.  The third, however, was much more enthusiastic about the
Germanic etymology, and noted that the derivation of late Latin
<forestis> from <foris> 'outside' is far from secure, and that
a loan from Germanic has been suggested.  Well, turnabout is fair
play, I guess.

Anyway, it appears that I cannot yet add 'forest' and <Forst>
to my little collection of striking chance resemblances.  But one
of my respondents (SG) sent in a couple of lovely examples of
chance resemblances:

German /Scheune/ "shack" : Coptic /shoine/ id.
German /Schuh/ "shoe" : Itelmen /sxu/ (works even better with Dutch)
aso.

(Itelmen is a Chukcho-Kamchatkan language of eastern Siberia.)

My thanks to David Fertig, Stefan Georg, and Paolo Ramat.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From larryt at cogs.susx.ac.uk  Wed Sep 27 11:35:18 2000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Date: Wed, 27 Sep 2000 07:35:18 EDT
Subject: Q: Sarich and historical linguistics
Message-ID: <WED.27.SEP.2000.073518.EDT.LARRYT@COGS.SUSX.AC.UK>

----------------------------Original message----------------------------
In a few weeks, I'm giving a talk on the perception of language
and linguistics among our academic colleagues in other disciplines,
such as psychology, anthropology, archaeology, primatology and
genetics.  Most of this talk will deal with non-historical
matters, but I want also to talk about the seemingly immense
influence of the long-rangers among our colleagues, who often
appear to believe that the long-rangers speak for historical
linguistics.  See, for example, the writings of the geneticist
Robert Sokal, of the palaeoanthropologist Richard Klein, and of
the primatologist Robin Dunbar.

But I've become particularly interested in the writings of the
eminent molecular anthropologist Vincent Sarich, one of the
founders of the out-of-Africa hypothesis of human origins.
Unlike most other non-linguists, Sarich has stepped into
historical linguistics in a big way -- and he doesn't like us
historical linguists very much.  In a 1994 article, he
warmly defends the long-rangers, and he hurls abuse at those
linguists who have criticized their work, accusing the critics
of being anti-scientific and of acting from the basest motives:

Vincent M. Sarich (1994), 'Occam's razor and historical linguistics',
in M. Y. Chen and O. J. L. Tzeng (eds), In Honor of William S.-Y.
Wang, Pyramid Press, pp. 409-430.

But I'm more interested right now in another of Sarich's articles,
published on the Web in 1994 and apparently not published elsewhere.
This article also carries a good deal of abuse directed at the critics
of Greenberg and Ruhlen:

        http://pubpages.unh.edu/~jel/sarich.html

Here is the passage I'm interested in:

"A similar scenario would also appear to apply in the linguistic
realm, but to see it we first need to challenge the extremely
conservative current consensus among most linguists that relationships
among languages that diverged more than perhaps 7,000-8,000 years ago
are, at present, unknowable.  A simple exercise suffices here to show
that this consensus is unreasonably pessimistic.  One simply sits down
with, for example, Buck's A Dictionary of Selected Synonyms in the
Principal Indo-European Languages, a basic word list, and some
independent knowledge of two or more languages representing distinct
Indo-European groups.  I used English and Croatian, representing,
respectively, its Germanic and Slavic branches.  If one then asks what
proportion of the words in modern Croatian appear, simply by inspection
(but allowing for some phonetic and semantic drift), to be cognate with
the reconstructed Proto-Indo-European (PIE) form (or, where that is
unavailable, the English word), one gets a minimum figure of about 60%.
For example, snow, snjeg, *sneigwh; many, mnogo, *monogho; blood, krv,
*kru; tree/wood, drvo, *dru; earth, zemlja, *ghem.  Similar results were
obtained using native speakers of Spanish and Bengali, and for Armenian
and Albanian using Decsy's The Indo-European Protolanguage: a
Computational Reconstruction.  Thus 60% survival seems to be a
reasonably representative figure for the survival of PIE roots with
meanings in extant Indo-European languages.

"Now obviously some number of these matches will be coincidental (though
that number will likely be small, as illustrated by the fact that
Chinese, by the same test, will show less than 10% apparent 'cognacy'
with PIE, English, or Croatian -- I am indebted to Dr W S-Y Wang for
this comparison), but, by the same token, some will be missed when the
degree of phonetic or semantic change makes cognacy less than obvious.
For example -- foot, noga, *ped -- where one might miss the English
correspondence because of the phonetic changes, and would (and, perhaps,
should) certainly miss the Croatian unless one remembered that 'pod' in
Croatian means 'under', and that an association between 'under' and
'foot' is perfectly reasonable.  This would imply a cognacy loss of less
than 10% per millennium along a lineage, implying that even at a time
depth of 12,000-14,000 years; that is, twice the probable time which
separates modern Croatian from its Proto-Indo-European ancestor, one
might retain 30% or so phonetic/semantic cognacy.  Thus one could
recognize relationships among languages whose common ancestor lay that
far in the past provided that one looked at a sufficient number of them,
and avoided simple binary comparisons.  That is, if each of two
descendant languages retains 30% cognacy with the ancestral language,
they will, on average, share only 9% [(0.3)2] with one another -- and
this gets into the chance area of similarity.  On the other hand, if you
look at 10 such languages, three, on the average, will retain a
particular cognate -- greatly increasing your chances of recognizing
relationships among them, and of reconstructing the ancestral form.
This is the procedure and argument of Greenberg [(1987); see also
discussion in Ruhlen (1987)], and, whatever the questions that might be
raised about certain details, there can be no doubt the current general
consensus among most linguists that relationships among languages older
than about 7,000 years are, at present, unknowable, is unrealistically
and unreasonably pessimistic and conservative."  [END QUOTE]

Now, many of these general issues have been much discussed elsewhere,
and I have my own views, which I will reserve for the time being.
But I am interested in hearing comments from colleagues on any part of
this passage, though most particularly on the following points:

        *the use to which Sarich puts Buck's dictionary;

        *the claim that any given living IE language retains about 60%
        of the PIE lexicon in easily recognizable form;

        *the claim that genuine cognates among living IE languages are
        overwhelmingly obvious and trivial to identify by inspection alone;

        *the claim that this result automatically generalizes to other
        families, even to families which are as yet unrecognized.

Please reply directly to me, since I have no wish to flood this list
with discussions of long-ranger work.  I'll post a summary when I can.


Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad)
Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad)


From DISTERH at UNIVSCVM.SC.EDU  Fri Sep 29 12:17:40 2000
From: DISTERH at UNIVSCVM.SC.EDU (Dorothy Disterheft)
Date: Fri, 29 Sep 2000 08:17:40 EDT
Subject: Sarich and historical linguistics
Message-ID: <FRI.29.SEP.2000.081740.EDT.DISTERH@UNIVSCVM.SC.EDU>

In a message dated 9/27/2000 6:36:24 AM, larryt at cogs.susx.ac.uk writes:

<<... I want also to talk about the seemingly immense influence of the=20
long-rangers among our colleagues, who often appear to believe that the=20
long-rangers speak for historical linguistics. =20
...I've become particularly interested in the writings of the eminent=20
molecular anthropologist Vincent Sarich, one of the founders of the=20
out-of-Africa hypothesis of human origins. Unlike most other non-linguists,=20
Sarich has stepped into historical linguistics in a big way -- and he doesn'=
t=20
like us historical linguists very much.>>

I hope Larry and everyone else will understand my posting this to the list.=20=
=20
I think it's important to just add a few observations about Sarich that may=20
put his remarks in context.

First of all, it should be remembered that Vincent Sarich has for a long tim=
e=20
taken an advocacy position (and called himself an advocate) regarding certai=
n=20
aspects of human genetics.  He has been for example prominently involved in=20
the dialogue on race and IQ.  And it should also be noted that the article=20
Prof Trask cites (http://pubpages.unh.edu/~jel/sarich.html), entitled "RACE=20
and LANGUAGE in PREHISTORY", is clearly a piece of "advocacy," which=20
obviously treats historical linguistics only as it relates to and serves to=20
advance Sarich's goals with regard to a somewhat larger argument.

Sarich's position on Greenberg and longrangers is pretty much dictated by th=
e=20
Out-of-Africa hypothesis and various other positions Sarich takes regarding=20
genetics and human culture. =20

What clear from the piece is that Sarich is trying to backdate language far=20
enough to make its diversity correlate with current human genetic diversity.=
 =20
Sarich advocates the view that modern human diversity, human intelligence an=
d=20
cultures were born full-blown at some point after the Out-of-Africa event=20
some 100,000 years ago -- with relatively little convergence since.  In the=20
piece, his argument with scientists claiming that language is a recent=20
development is expressly motivated by his position that languages matches up=
=20
with racial genetics.  Sarich is not really a lumper in the strict sense.

And given all the above some caution might be called for in using Sarich as=20
representative of an academic non-linguist's views of historical linguistics=
.=20
 I suspect that if it better served his larger purposes, he would be citing=20
Lehmann and Trask.

This isn't the first time of course that historical linguistics has been=20
called upon to support wider conclusions about human history.  Sarich is=20
fairly unique however in viewing certain elements of it as supporting=20
conclusions that reach back some 30,000 years.

It should be said that there are serious scientists who are not comfortable=20
with Sarich's understanding of the evidence of paleo-culture, much less of=20
his understanding of paleo-language.  (And that's not to say that the geneti=
c=20
implications of Out-of-Africa hasn't been challenged either.)

Some of us think Sarich may be seriously underestimating paleo-humans and ho=
w=20
long it took to develop something as sophisticated as human biology, human=20
culture and human language.  On another web page, for example, one can find=20
an article by the formidable paleobiologist Henry Gee about the Sch=F6ningen=
=20
spears. (http://quartz.ucdavis.edu/~GEL115/spears.html)  To some, the=20
sophistication and possibly accumulative design of these 400,000 year old hu=
n
ting javelins suggests that they could not have been developed or redevelope=
d=20
in a single generation.  And accumulating and transmitting complex knowledge=
=20
from one generation to the next suggests some form of transmission, perhaps=20
some form of language.

Finally, I'd point out also that maybe it is the traditional assumption of=20
strict vertical descent in languages that makes any part of historical=20
linguistics attractive to Vincent Sarich and his "anti-convergenist"=20
monogenetic polemics.  Those of us who think that there may be a relatively=20
high degree of convergence in linguistic history don't find commonalities=20
between languages extremely precise in illuminating prehistory or necessaril=
y=20
indicative of some common noble biological ancestor.  After all, the most=20
basic function of language is communication and that should move us all to=20
try to speak the same language, not different ones.

And, of course, it's refreshing for us "convergenists" to see that the=20
primacy of vertical descent has recently taken a good drubbing in biology. =20
(See, e.g., Stephen Jay Gould's "Linnaeus's Luck" in Natural History,=20
September 2000).  And some of us expect the same to eventually happen on a=20
different level to "Out-of-Africa".

In the meantime, it might be suggested that Vincent M. Sarich's views are no=
t=20
at all the best reflection of how informed non-linguists understand=20
historical linguistics.

Steve Long