From ratcliff at fs.tufs.ac.jp Fri Sep 1 13:16:51 2000 From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe) Date: Fri, 1 Sep 2000 09:16:51 EDT Subject: Q: the 'only six' argument Message-ID: > ----------------------------Original > message---------------------------- > Larry Trask wrote: > > > So, my question: does anybody believe that any version of this > > statement is valid? More precisely, do we have a number N and > > a set of criteria C such that the existence between two languages > > of N matches satisfying criteria C is enough to guarantee that > > the languages must be related? Wasn't this what Donald Ringe was trying to do? (The Factor of Chance in Language Comparison, Philadelphia 1992). But the statement phrased as you have it is certainly not valid. First no number of "matches" (sound correspondences?) can *guarantee* that the languages are related, only that the probability of their being related is high. Second "related" has to be understood as historically related rather than genetically related, because numerical criteria only help to decide the issue chance vs. non-chance similarity, not which type of historical contingency (descent from a common source or subsequent contact) may have produced the non-chance pattern. Third there is no absolute number valid in all cases, because it depends on the size and nature of the sample being compared. Specifically in the case of sound correspondences, the bigger the dictionary or word list the more chance correspondences can be expected; and the smaller the segment inventories of the languages compared the more chance correspondences can be expected. This is because the average expected number of chance occurences of an event (in this case a correspondence at a given position in a word) is the probablility of the event (in this case the relative frequency in the given position of the segments compared multiplied by each other) times the number of trials (in this case the number of semantically equivalent words available for comparison). So if you have two languages A and B, both of which have only ten consonants evenly distributed in word first position, and you have an A-B dictionary which has 10,000 entries correlating one word in A with one and only one semantic equivalent in B, with no synonyms in either langauge, you'd expect to find about 100 matches between any first consonant in A and any first consonant in B (chance that x will occur as first consonant in A: 1/10, multiplied by chance that y will occur as first consonant in B: 1/10, multiplied by total places where 1st C of a word in A can be compared with 1st C of word in B: 10,000). So you wouldn't be justified in suspecting a historical relationship till you got a good bit over a 100 matches. On the other hand if you had two languages with 25 consonants evenly distributed and a lexicon based on a1000 word random sample, you'd expect an average of only 1.6 first consonant matches (1/25 * 1/25 * 1000). So you'd be justified to suspect a non-accidental, hence historical relationship even with as few as 4 or 5 matches. Bobby D. Bryant wrote: > > > In short, I don't think such a formalization of the problem in terms > of N and > C is going to work in practice. At some level you are always going to > have > to pile on enough examples to convince your peers, which is of course > the way > things have always worked. > Piling on enough examples to convince your peers no longer works in practice, or else the long-distance comparison debates would not have become as acrimonious as they have. Formalizing the problem seems to me to be the only way forward. Besides, isn't that where the joy of research lies-- in ever sharpening and refining our understanding of our subject matter and of the tools we use to analyze it? -- ----------------------------------------------------------- Robert R. Ratcliffe Dept. of Linguistics and Information Science Tokyo University of Foreign Studies Asahi-machi 3-11-1, Fuchu-shi, Tokyo 183-8534 Japan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jer at cphling.dk Sun Sep 3 14:19:19 2000 From: jer at cphling.dk (Jens Elmegaard Rasmussen) Date: Sun, 3 Sep 2000 10:19:19 EDT Subject: Q: the 'only six' argument In-Reply-To: <39AE4FFF.1750AFC5@mail.utexas.edu> Message-ID: ----------------------------Original message---------------------------- Dear List, I have occasionally shocked my students by insisting that ONE *probative* example is enough to prove the point for which it is probative. The statement, of course, is tautological: If the examples did NOT prove its point, it would not be probative, for that's what the word probative means. The consequence is that, e.g., in Indo-European, certain disputed groupings MUST be accepted unless we are willing to swallow very awkward camels: If the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo- cannot be imagined to me parallel developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto- with other adjectives), and one cannot be assumed to have been borrowed from the other (would you borrow a new form of the superlative, if your language has a perfectly good one already?), then there WAS an Italo-Celtic node in the splitting-up of the IE unity. Similar arguments could be set up for some of the points uniting Baltic and Slavic which look strong enough in themselves to carry the burden of proof even if they were not supported by others. Nice to see the list blossoming again. Jens E. Rasmussen From hwhatting at hotmail.com Mon Sep 4 12:46:49 2000 From: hwhatting at hotmail.com (Hans-Werner Hatting) Date: Mon, 4 Sep 2000 08:46:49 EDT Subject: Q: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- On Sun, 3 Sep 2000 10:19:19 EDT, J. E. Rasmussen wrote: >I have occasionally shocked my students by insisting that ONE *probative* >example is enough to prove the point for which it is probative. The >statement, of course, is tautological: If the examples did NOT prove its >point, it would not be probative, for that's what the word probative >means. > The consequence is that, e.g., in Indo-European, certain disputed >groupings MUST be accepted unless we are willing to swallow very awkward >camels: If the Celtic superlative in *-isamo- and the Italic one in >*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo- >[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other >adjectives), and one cannot be assumed to have been borrowed from the >other (would you borrow a new form of the superlative, if your language >has a perfectly good one already?), then there WAS an Italo-Celtic node in >the splitting-up of the IE unity. Similar arguments could be set up for >some of the points uniting Baltic and Slavic which look strong enough in >themselves to carry the burden of proof even if they were not supported by >others. Languages don4t only borrow words or formations because they don4t have an adequate expression for a concept. Simply imitating a formation seen as more expressive or the usage of a language which is seen as more prestigious also plays a role. A good example from modern German is the borrowing of the English way of expressing the year in which an event happened. The traditional way in German is to say "Es geschah 1999.", but now quite often one can find "Es geschah in 1999.", which is a clear calque on English. The reason behind this is, of course, the big prestige of the English language, its far-spread knowledge, and also that this formation is more expressive than the traditional German one. A superlative formation seems to be a good candidate for borrowing on grounds of expressiveness. I don4t want to say that the superlative formation quoted cannot serve as proof for Italo-Celtic unity. But if there is only one example (in this case, of course, there are more than one, but the evidence is still inconclusive), one can never exclude borrowing. The only thing it proves is that the speakers of Proto-Celtic and Proto-Italic have been living close enough to borrow from one another. I would like to add the following to the general discussion: 1.) No quantity of matches can ever prove genetic relationship. One can probably find thousands of matches between, e.g., French and English or Latin and Albanian, without Albanian or English being Romance languages. 2.) There is, as far as I knoe, some sort of communis opinio on that certain matches (from basic vocabulary, grammatical morphemes) are more important for proving genetical relationship than others. 3.) I would recommend that if one has collected one4s matches, one should try a reconstruction. If the results are a decent basic vocabulary, and a basic common grammar, the languages examined are most probably genetically interrelated. There4s of course the question how to define "decent basic vocabulary" and "basic common grammar", and that4s (besides the questionableness of many matches) the main problem for wide-range reconstructions like Nostratic, Proto-World etc. Anyone interested in formulating some minimalist criteria? 4.) Always look at the history behind the matches. Are their historical links between the carriers of the respective languages, and of which kind are they? This is of course impossible if the history is not known, and if one wants to use language to reconstruct history. -- Essentially, I think a numerical approach does not take us very far. The most important question seems to me, can we reconstruct a system based on the matches, and what does it look like? If we get a basic grammar and basic vocabulary, there are strong reasons to suspect genetical relationship; if we get (say) a group of religious words, we can assume borrowing based on religious influences, and so on. Here, of course, numbers play a role - one simply needs a sufficient number of matches to constitute a system. But if we have to small a number of matches to form a convincing system, only historical evidence can help. Best regards, Hans-Werner Hatting, mag. phil. _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. Share information about yourself, create your own public profile at http://profiles.msn.com. From r.rankin at latrobe.edu.au Mon Sep 4 12:40:49 2000 From: r.rankin at latrobe.edu.au (R. Rankin) Date: Mon, 4 Sep 2000 08:40:49 EDT Subject: Q: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- Larry Trask wrote: > Quite often, in my reading, I've come across a statement of the > following type: > "The presence of only six good matches between two languages > is enough to show that the languages must be genetically related." > ... the number is always different. Six is the smallest I've ever seen, but > I've also seen 15, 50 and various other numbers. I don't recall often seeing such claims, but it is probable that I just disregarded them and read on. Personally, I'm very skeptical about the possibilty of developing any airtight criteria for genetic relationship that will work cross-linguistically. This sort of thing has to be done on a case-by-case basis. Factors may include various structural considerations (phonological, morphological and lexical), likelihood of creolization, likelihood of participation in a Sprachbund, etc. Meillet is said to have remarked that one could tell if a language were Indo-European or not just by examining the conjugation of the verb 'be' (though at my present location I cannot give you a citation). I tend to agree with his insistence on morphological criteria, but still think there are far too many potential variables for us to permit ourselves to be dogmatic. Noodling around with "universal criteria" is an enterprise for synchronists; we should not let ourselves be seduced into trying it in genetic linguistics. Bob Rankin -- Robert L. Rankin, Visiting Fellow Research Center for Linguistic Typology Institute for Advanced Study La Trobe University Bundoora, VIC 3083 Australia Office: (+61 03) 9467-8087 Home: (+61 03) 9499-2393 From degraff at MIT.EDU Tue Sep 5 09:57:18 2000 From: degraff at MIT.EDU (Michel DeGraff) Date: Tue, 5 Sep 2000 05:57:18 EDT Subject: Q: the 'only six' argument In-Reply-To: Your message of "Mon, 04 Sep 2000 08:46:49 EDT." Message-ID: ----------------------------Original message---------------------------- Holding humbly and tightly on my creolist-cum-syntactician hat, I would like to inquisitively and constructively piggy-back on Hans-Werner Hatting's observations and questions regarding (alleged) criteria for genetic relatedness. > 1.) No quantity of matches can ever prove genetic relationship. One can > probably find thousands of matches between, e.g., French and English or > Latin and Albanian, without Albanian or English being Romance languages. In a similar vein, note that the etymology of Haitian Creole---a (so called) "non-genetic" language---is overwhelmingly French while the lexicon of Modern English---a (so called) "genetic" language---is mostly non-Germanic etymologically. Besides, virtually all Haitian Creole affixes have cognates in French affixes whereas English has many affixes of non-Germanic origins. By the way, the latter observation about Haitian Creole suffices to falsify all these `classic' Creole-genesis scenarios that posit a affixless-pidgin phase a la Jespersen, Bickerton, McWhorter, Seuren, etc. > 2.) There is, as far as I knoe, some sort of communis opinio on that certain > matches (from basic vocabulary, grammatical morphemes) are more important > for proving genetical relationship than others. Virtually all of Haitian Creole's grammatical morphemes are etymologically French. > 3.) I would recommend that if one has collected one's matches, one should > try a reconstruction. If the results are a decent basic vocabulary, and a > basic common grammar, the languages examined are most probably genetically > interrelated. There's of course the question how to define "decent basic > vocabulary" and "basic common grammar", and that's (besides the > questionableness of many matches) the main problem for wide-range > reconstructions like Nostratic, Proto-World etc. Anyone interested in > formulating some minimalist criteria? Given what I've noted above vis-a-vis lexicon and morphology, it then seems that *absence* of "basic common grammar" would be *the* structural criterion for claiming that Creole languages such as Haitian Creole are "non-genetic" languages that arose via "abnormal transmission" whereas French, say, is a "genetic" language that arose via "normal transmission". Let me try and be more precise as to what I think are the implications of an hypothetic "basic common grammar" with respect to the genetic-vs-non-genetic hypothesis as it applies to, say, Haitian Creole vs. French. Whatever features define this "basic common grammar", these features must diverge when comparing the grammars of (colloquial) 17th-18th century French dialects to that of Haitian Creole, and such divergences must be *qualitatively* different than their counterparts in the ("genetic") course of French diachrony. So far, I have not be able to isolate such features. Whatever divergences exist between colloquial 17th-18th century French dialects and Haitian Creole (e.g., `loss' of verbal inflection, verb-placement differences, etc.) seem to have counterparts in the diachronic course of `genetic' languages. And what I find most intriguing is that such divergences in `genetic' diachrony also seem to coincide with the history of contact within these `genetic' diachronies. This was, of course, noted by Meillet, although he would most likely not agree with the conclusions I seem drawn to. In any case, if the "basic common grammar" remains elusive, then perhaps it's time to seriously (re-)challenge the alleged (non-)genetic dichotomy between Creole and non-Creole languages and/or the very concept of "genetic relatedness" as a linguistically (i.e., *structurally*) definable concept. Then again, I still need to learn more about the structural basis of genetic linguistics. This, I look forward to. -michel. ___________________________________________________________________________ MIT Linguistics & Philosophy, 77 Massachusetts Ave, Cambridge MA 02139-4307 degraff at MIT.EDU http://web.mit.edu/linguistics/www/degraff.home.html ___________________________________________________________________________ From larryt at cogs.susx.ac.uk Tue Sep 5 09:59:09 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Tue, 5 Sep 2000 05:59:09 EDT Subject: Sum: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- I was planning to post a summary of the responses to my query last week about the 'only six' argument. However, after the first few respondents replied to me privately, the responses shifted to the list, and so all of you will now have seen most of the responses already. I will therefore content myself with reporting that no one who has so far replied has expressed any great sympathy with any version of the 'only six' argument, and several people have been openly hostile. These negative responses don't surprise me at all. I am certainly not sympathetic to the 'only six' argument. It's just that I keep coming across claims of this sort every now and again, and I was beginning to wonder if a significant number of historical linguists were embracing such arguments. Apparently not. Anyway, I hope we may continue the discussion on the list, so long as Dorothy is willing. My mail spool has been rather short of interesting historical discussions since the IE list suddenly collapsed last April. My thanks to everyone who has replied. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From larryt at cogs.susx.ac.uk Fri Sep 8 12:18:58 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Fri, 8 Sep 2000 08:18:58 EDT Subject: Q: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- This is an etymological question. English 'forest' is, of course, borrowed from Old French, where it goes back to Late Latin 'outer forest', with the first element possibly from 'outside'. I had always assumed that German 'forest' had the same origin. But, on checking, I find that things are more complicated. Some sources agree that the German word is of the same origin as the English one. But other authorities, including Kluge, give a quite different etymology. They derive from an unrecorded *, a derivative of Old High German 'fir tree' (modern ), with a semantic shift 'fir forest' > 'conifer forest' > 'forest'. Davis, in his English edition of Kluge, observes that opinion is divided on this etymology. Just to complicate things, Middle High German had a word 'forest', which even the proponents of Kluge's etymology seem to agree is derived from Latin and unrelated to modern . So, my question is this. Is there now general agreement on the etymology of ? Or is the question still up in the air? I ask because, if the Germanic etymology of is confirmed, then 'forest' and constitute one of the most wonderful chance resemblances I have ever seen -- right up there with English 'much' and Spanish 'much', and English 'bad' and Persian 'bad'. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From paoram at unipv.it Sat Sep 9 19:04:21 2000 From: paoram at unipv.it (Paolo Ramat) Date: Sat, 9 Sep 2000 15:04:21 EDT Subject: R: Q: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- -----Messaggio originale----- Da: Larry Trask +ADw-larryt+AEA-cogs.susx.ac.uk+AD4- A: HISTLING+AEA-VM.SC.EDU +ADw-HISTLING+AEA-VM.SC.EDU+AD4- Data: sabato 9 settembre 2000 1.55 Oggetto: Q: German Forst 'forest' +AD4-----------------------------Original message---------------------------- +AD4-This is an etymological question. +AD4- +AD4-English 'forest' is, of course, borrowed from Old French, +AD4-where it goes back to Late Latin +ADw-forestis (silva)+AD4- 'outer forest', +AD4-with the first element possibly from +ADw-foris+AD4- 'outside'. +AD4- +AD4-I had always assumed that German +ADw-Forst+AD4- 'forest' had the same +AD4-origin. But, on checking, I find that things are more complicated. +AD4- +AD4-Some sources agree that the German word is of the same origin +AD4-as the English one. But other authorities, including Kluge, +AD4-give a quite different etymology. They derive +ADw-Forst+AD4- from an +AD4-unrecorded +ACoAPA-forhist+AD4-, a derivative of Old High German +ADw-foraha+AD4- +AD4-'fir tree' (modern +ADw-F+APY-hre+AD4-), with a semantic shift 'fir forest' +AD4- +AD4-'conifer forest' +AD4- 'forest'. Davis, in his English edition of Kluge, +AD4-observes that opinion is divided on this etymology. +AD4- +AD4-Just to complicate things, Middle High German had a word +ADw-forest+AD4- +AD4-'forest', which even the proponents of Kluge's etymology seem to +AD4-agree is derived from Latin and unrelated to modern +ADw-Forst+AD4-. +AD4- +AD4-So, my question is this. Is there now general agreement on the +AD4-etymology of +ADw-Forst+AD4-? Or is the question still up in the air? +AD4- +AD4-I ask because, if the Germanic etymology of +ADw-Forst+AD4- is confirmed, +AD4-then 'forest' and +ADw-Forst+AD4- constitute one of the most wonderful +AD4-chance resemblances I have ever seen -- right up there with +AD4-English 'much' and Spanish +ADw-mucho+AD4- 'much', and English 'bad' and +AD4-Persian +ADw-bad+AD4- 'bad'. +AD4- +AD4- +AD4-Larry Trask +AD4-COGS +AD4-University of Sussex +AD4-Brighton BN1 9QH +AD4-UK +AD4- +AD4-larryt+AEA-cogs.susx.ac.uk +AD4- +AD4-Tel: 01273-678693 (from UK)+ADs- 1273-678693 (from abroad) +AD4-Fax: 01273-671320 (from UK)+ADs- 1273-671320 (from abroad) +ACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgA qACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKg AqACoAKgAqACoAKg- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA 9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQ- Dear Larry, the etymology of Germ. +AF8-Forst+AF8- proposed also in the 23rd. ed. of Kluge's Etym. Wtb. der dt. Spr. (by E. Seebold, 1995) sounds rather unconvincing. From a Gmc. +ACoAXw-forhist+AF8- +ACI-Gehegtes+ACI- we should have MHG +AF8-foerhest+AF8- (with Umlaut) and NHG +ACoAXw-foerst+ACI-, just as we get +AF8-lengest+AF8-(+ADwAKgBf-langisto+AF8-), +AF8-ermest+AF8-(+ADwAKgBf-armisto+AF8-) etc., and NHG +AF8-laengst+AF8-, +AF8-aermst+AF8- Moreover, +AF8-Forst+AF8- seems not to be ProtoGmc.: it is attested in Germ. and Dutch (+AF8-vorst+AF8-) only . Thus I think you are right: the chance that we have here a loanword from Latin seems more plausible than the other hypothesis. Also De Vries, Nederl. etymol. Woordenb., says that OHG +AF8-forst+AF8- ( ca. 800) may derive from MLat. +AF8-forestis+AF8- , +ACI-reeds in 648 in een oorkonde voor Stavelot-Malm+AOk-dy+ACI-. Best, Paolo From ratcliff at fs.tufs.ac.jp Sat Sep 9 19:05:04 2000 From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe) Date: Sat, 9 Sep 2000 15:05:04 EDT Subject: Sum: the 'only six' argument Message-ID: Larry Trask wrote: > I will therefore content myself with reporting that no one > who has so far replied has expressed any great sympathy with > any version of the 'only six' argument, and several people > have been openly hostile. > > These negative responses don't surprise me at all. I am > certainly not sympathetic to the 'only six' argument. It's > just that I keep coming across claims of this sort every now > and again, and I was beginning to wonder if a significant > number of historical linguists were embracing such arguments. > Apparently not. Wait just a second there. I may have sounded negative myself. But when I thought about it a little more, I realized that there is a legitimate and interesting argument there, and it ought to be in historical linguistics textbooks if it isn't. ( I don't know if this is the argument you have seen, but I'd be interested to know if it IS any textbooks?). Basically, IF one has set up the question properly and IF one has carried out the comparison with discipline and honesty (big ifs, of course), then a very small number of examples of a single sound correspondence is sufficient to demonstrate a historical (not necessarily a genetic) relationship beyond any reasonable doubt. Practically speaking, given the sample sizes we usually work with and the way that phonological systems are set up, in most cases, the necessary number is indeed around six or not much more. This isn't anything for anyone to be hostile to (or sympathetic to, for that matter); it simply follows necessarily from the logic of probability. I'll explain, but first a clarification. When you ask about numerical criteria for a genetic relationship, you are asking (at least) two separate questions. Most of the respondents addressed the second question-- what are the criteria for determing if two historically related languages are related genetically-- as opposed to being related by contact or borrowing, or by being in a lexifier-creole relationship. Some respondents addressed the question of what criteria are relevant for subclassifying genetically related languages. As far as I can see (and as most of the respondents said), numerical criteria simply are not relevant for making these kinds of judgements. It's the nature of the similarities or commonalities, not the number of them that count. In any case probability theory doesn't come into play because in all these cases we have already ruled out coincidence as an explanation. But when approaching unclassified languages or languages which haven't been compared to each other before, the first question we have to ask is whether these languages have something in common which cannot be due to chance or coincidence. Numerical criteria and probability theory are the most reliable means for making judgements of this type. Here's how you end up with only six: First the average expected number of chance matches between any two consonants in any two languages (that is the expected number of times the consonants will appear in the same position in a word with the same meaning) is the frequency of the first consonant in its language times the frequency of the second consonant in its language times the number of word pairs available for comparison. Thus if ten percent of the words start with /t/ in one language and ten percent of the words in the other language start with /b/ then in a hundred word sample, there should be (by chance) one case where the translation of a word starting with /t/ in the first language starts with /b/ in the second. In a 1000 word sample there should be about ten such cases. One rough guide to frequency of a consonant is simply 1 over the number of consonants in the inventory. So if you have twenty consonants the average frequency of each consonant is 1/20 or .05. If you have a Macintosh with a graph calculator try entering this formula 1/x^2*n100 (one over x squared times n times 100). This gives you the expected number of correspondences, in a sample with n100 word pairs, of two languages both with x number of consonants, evenly distributed. You can see from this that as long as the average size of the consonant inventory is greater than 10 (or put another way, where no consonant occupies more than ten percent of the word positions being compared) the expected number of chance matches in a 100 wd sample is between 1 and 0. That is in a 100 word sample you expect that each consonant (in initial position) in one language will match up with each consonant in the other in one word or not at all. In a 1000 word sample the expected chance avgs. are not all that much higher-- basically if the average size of the consonant inventories is 14 (or the avg. frequency no more thant 1/14), you only expect to get 5 chance correspondences, though below 14 the expected number starts to climb dramatically. (At 5 the expected number is 40). The next question is how far above the average do we have to get before coincidence becomes an absurdly unlikely explanation. There is a formula for this, but I won't go through it since this post has gotten long. But here is one example: In the case where two langauges both have 20 consonants evenly distributed (or more realistically in comparing two consonants in two languages both of which have a frequency of 5% in the word-position being compared in their respective languages), the probabilty of finding more than 5 correspondences (i.e. 6 or more) in a 100 wd. sample is 0.000000356, or roughly 1 in 2.8 million. (The chance of finding 5 or more is roughly 1 in 163,000.) So in this set of circumstances "6 or more" (i.e a single correspondence set occuring in a given position-- say word-initially-- in 6 or more words) should be pretty well conclusive for demonstrating a non-chance and hence almost certainly historical (genetic or contact) relationship. I think that working all this out mathematically is interesting and important for compartive linguistics for two reasons. First it means that if you apply the comparison strictly (allow only one-to-one word comparisons, and one-to-one phoneme comparisons) you can get more knowledge from less information-- you can potentially demonstrate a relationship with much less data than comparativists have traditionally thought necessary. This is important to me, because I work in Afroasiatic, where the perpetual concern is exactly how to get more knowledge with less information (few old texts for most langauges). But the other side of this is that the mathematics makes it perfectly clear that if you relax the semantic and phonemic criteria far enough, you quickly come to a point where the expected number of chance correspondences becomes so high, that it becomes practically impossible to mount an effective demonstration of a relationship. The relevant parameters are number of comparisons and frequency of consonants. If you allow for comparison of each word with a wide range of semantically close words you multiply the number of comparisons and effectively increase the sample size. (A pair of 1000 wd-lists with one-to-one matching is the same mathematically as two 100 wd. lists with each word compared with 10 words in the other language-- both give 1000 pairs or trials). Going back to the previous example with frequency of 5% for each consonant the number of matches you need to get to the 1 in a million or better range for different samples sizes are: 200-8, 500-10, 1000-14, 2000-19. In other words although the average number of expected chance correspondences increases geometrically with sample size, the number needed for reasonable certainty of non-chance goes up at a higher rate. If you are considering each word in a 1000 wd list against 20 or 30 semantically close words, the effective sample size-- and hence the number of matches needed to demonstrate a non-chance relationship-- becomes gigantic. (I don't have a calculator powerful enough to calculate it though, sorry.) Similarly If you allow many-to-many phoneme matchings, you effectively increase the frequency. If you compare two systems of 15 consonants at 3 points of articulation one-to-one the chance of a match is on average 1/15 squared. The expected number of chance matches in a 1000 word sample is between 4 and 5 (4.44)-- reasonable. The chance of matching any two consonants at the same point of articulation is 1/3 squared. In a 1000 wd. sample the expected number of chance matches is 111-- a big jump. Thus with very loose criteria, the comparatist is in the paradoxical position of having to prove the existence of hundreds of "bad" (random) correspondences in order to have any confidence of having found in any good ones (ones which actually reflect language history). And if there really are any good correspondences, the problem of how to pick them out from all the random "noise" which is certain to be there is daunting. > -- ----------------------------------------------------------- Robert R. Ratcliffe Associate Professor, Arabic and Linguistics, Dept. of Linguistics and Information Science Tokyo University of Foreign Studies Asahi-machi 3-11-1, Fuchu-shi, Tokyo 183-8534 Japan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kroch at change.ling.upenn.edu Thu Sep 14 01:33:29 2000 From: kroch at change.ling.upenn.edu (Tony Kroch) Date: Wed, 13 Sep 2000 21:33:29 EDT Subject: Announcing the second edition of the Penn-Helsinki Parsed Corpus of Middle English Message-ID: ----------------------------Original message---------------------------- The second edition of the Penn-Helsinki Parsed Corpus of Middle English (PPCME2) is now publicly available under the conditions outlined below. It consists of 55 text samples containing 1.3 million words of syntactically annotated Middle English prose and ranging over four time periods, from 1150 to 1500. Like the first edition of the PPCME, the PPCME2 is based on the Middle English portion of the Helsinki Corpus of English Texts that was created at the University of Helsinki under the direction of Matti Rissanen and Ossi Ihalainen. The size of the text samples in the second edition has been enlarged so that the total corpus size is nearly three times larger. In addition, the corpus is now tagged for part of speech and the syntactic annotation system is richer. For the earliest time period, all texts except one are complete; the exception is the Ancrene Riwle sample, which contains approximately 50,000 words. For the later time periods, two texts per time period were expanded to approximately 50,000 words. The remaining texts are represented by the Helsinki Corpus sample. The PPCME2 is being distributed on a CD-ROM that includes several files for each text in the corpus: - a file with unannotated text - a file with philological and other information about the text (manuscript and edition used, date, dialect, genre, and word count of the sample) - a file in which individual words are tagged for part of speech - a file that is annotated for syntactic structure Available with the corpus is CorpusSearch, a Java program written by Beth Randall that runs under Unix, Linux, MacOS and Windows. CorpusSearch uses standard syntactic predicates like ``(immediately) precedes'', ``(immediately) dominates'', and Boolean combinations thereof, and it allows outputs of previous search as inputs to further searches. To order the PPCME2, please go to http://www.ling.upenn.edu/mideng and follow the instructions there. The cost of a subscription to the corpus is $200 and the cost of a license for CorpusSearch is $50. The items may be purchased together or separately. Proceeds from the sale of the corpus will pay for improving the corpus and for increasing its size over time. Proceeds from the sale of CorpusSearch will go to the author. The PPCME2 was designed and built by Anthony Kroch and Ann Taylor at the University of Pennsylvania. Supplementary assistance was provided by Beatrice Santorini. The PPCME2 is part of of a larger project to produce a parsed diachronic corpus of English from 800 to 1800. The Old English part is under construction at York under the direction of Anthony Warner, Susan Pintzuk, and Ann Taylor and the Early Modern English part is under construction at the University of Pennsylvania under the direction of Kroch and Santorini. From Ann.Kumar at anu.edu.au Thu Sep 14 11:01:01 2000 From: Ann.Kumar at anu.edu.au (Ann Kumar) Date: Thu, 14 Sep 2000 07:01:01 EDT Subject: "only six" argument Message-ID: We have been following the HISTLING discussion initiated by Larry Trask with interest, because we have been involved over the last two years in a particular case that had to solve the problem of the amount of data that is necessary to establish relatedness. (Not genetic, but via borrowing). We have been doing what Robert R. Ratcliffe takes as his starting point in his last e-mail, i.e. "approaching unclassified languages or languages which haven't been compared before [where] the first question we have to ask is whether these languages have something in common which cannot be due to chance or coincidence." The results will be published in the December issue of Oceanic Linguistics, but we thought it might interest LIST members to have a sneak preview, at least of the (rahter long) section on probability, where we discuss relevant issues. (The section is attached.) We were trying to find out whether some semantic and phonological matches in Old Japanese and Old Javanese lexis were too extensive to be due to chance. In this particular case, rather than looking at single sound correspondences, we used whole-word comparison, and of longer words (CVCVC structure) with recurrent sound correspondences. While it is not possible to go into the calculations here, it turned out that in this case only one match between words of this length could be expected to occur by chance. In the section on probability Rose discusses the usefulness of the approaches taken earlier by Nichols and Ringe and goes on to propose that a Bayesian, rather than frequentist, statistical approach should be the preferred option. We have attached this section. We agree with Ratcliffe that "Numerical criteria and probability theory are the most reliable means for making judgements of this type". But we are able to demonstrate a few more things that might interest LIST readers, (and can also offer some real data!). As mentioned, we also have some points to make concerning the appropriateness of the frequentist (as opposed to a Bayesian) paradigm for evaluating questions of this kind (i.e assessing the probability of a hypothesis). (Bayesian formulations are used, for example, in forensics. We don't know to what extent historical linguistics are aware of them, so we offer them in case people are interested.) Ann Kumar Phil Rose -------------- next part -------------- A non-text attachment was scrubbed... Name: short_prob.doc Type: application/mac-binhex40 Size: 71221 bytes Desc: not available URL: -------------- next part -------------- =========================================================================== Dr Ann Kumar Vice-President, Australian Academy of the Humanities Centre for the Study of Asian Societies and Histories Faculty of Asian Studies Canberra ACT 0200 Australia Tel. (02) 6249 3677/4658 fax. (02) 6279-8326 From X99Lynx at aol.com Mon Sep 25 14:55:08 2000 From: X99Lynx at aol.com (Steve Long) Date: Mon, 25 Sep 2000 10:55:08 EDT Subject: Superlative Forms and Swallowing Camels Message-ID: ----------------------------Original message---------------------------- On Sun, 3 Sep 2000 10:19:19 EDT. jer at cphling.dk wrote: <> (Hi, Jens!) I must ask of course how we know that one language or the other already had "a perfectly good form of the superlative?" With all due respect to the writer, from whom I've already learned a great deal, I must ask whether the case is as clear cut as he perceives it. This camel might be the kind you find in animal cracker boxes -- bite-sized. Ironically, two relevant languages make no morphological distinction between the comparative and the superlative - Manx and French. If this says nothing else, it proves that languages can find themselves without any form of the superlative, much less "a perfectly good one." Whatever forces caused the loss of the superlative in those languages may have caused an earlier loss in either Celtic or Italic. And that would have meant one or the other of those two languages may have been in need of a superlative form and therefore had a very good reason to borrow it. And doesn't the question <> work both ways? Why would "Italo-Celtic" innovate a superlative form when they already had a perfectly good one? In my mind this raises again the question of how one distinguishes between a borrowing and descent from a common ancestor, IF the word or form is actually old enough to predate indicia of borrowing. Also, there are those of us who suspect that going back 4000+ years creates a great deal of uncertainty about what languages -- both IE and non IE -- the form could have been borrowed from. The reconstruction the author offers -- "the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo- cannot be imagined to be parallel developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto with other adjectives)" -- does not foreclose the possibility that development is one that occurred in some third language (or the specialized dialect of an influential, itinerant linguistic community -- like scribes or priests) and that both Latin and Celtic "borrowed" it independently. And finally, why would a language borrow a word like "superlative" when presumably back in the days of Old English, it "already had a perfectly good one?" Regards, Steve Long From larryt at cogs.susx.ac.uk Tue Sep 26 14:35:03 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Tue, 26 Sep 2000 10:35:03 EDT Subject: Sum: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- Some days ago I posted a query about the disputed etymology of German 'forest'. I got only three replies, but those were interesting. The query was whether German derives, like English 'forest', from a late Latin word, or whether it is a native word derived ultimately from the German word for 'fir tree'. Two of the respondents were skeptical of the German etymology. One of them suggested it might be a residue of the unfortunate Romantic tendency to seek "Germanic" etymologies for loans from Latin. The third, however, was much more enthusiastic about the Germanic etymology, and noted that the derivation of late Latin from 'outside' is far from secure, and that a loan from Germanic has been suggested. Well, turnabout is fair play, I guess. Anyway, it appears that I cannot yet add 'forest' and to my little collection of striking chance resemblances. But one of my respondents (SG) sent in a couple of lovely examples of chance resemblances: German /Scheune/ "shack" : Coptic /shoine/ id. German /Schuh/ "shoe" : Itelmen /sxu/ (works even better with Dutch) aso. (Itelmen is a Chukcho-Kamchatkan language of eastern Siberia.) My thanks to David Fertig, Stefan Georg, and Paolo Ramat. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From larryt at cogs.susx.ac.uk Wed Sep 27 11:35:18 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Wed, 27 Sep 2000 07:35:18 EDT Subject: Q: Sarich and historical linguistics Message-ID: ----------------------------Original message---------------------------- In a few weeks, I'm giving a talk on the perception of language and linguistics among our academic colleagues in other disciplines, such as psychology, anthropology, archaeology, primatology and genetics. Most of this talk will deal with non-historical matters, but I want also to talk about the seemingly immense influence of the long-rangers among our colleagues, who often appear to believe that the long-rangers speak for historical linguistics. See, for example, the writings of the geneticist Robert Sokal, of the palaeoanthropologist Richard Klein, and of the primatologist Robin Dunbar. But I've become particularly interested in the writings of the eminent molecular anthropologist Vincent Sarich, one of the founders of the out-of-Africa hypothesis of human origins. Unlike most other non-linguists, Sarich has stepped into historical linguistics in a big way -- and he doesn't like us historical linguists very much. In a 1994 article, he warmly defends the long-rangers, and he hurls abuse at those linguists who have criticized their work, accusing the critics of being anti-scientific and of acting from the basest motives: Vincent M. Sarich (1994), 'Occam's razor and historical linguistics', in M. Y. Chen and O. J. L. Tzeng (eds), In Honor of William S.-Y. Wang, Pyramid Press, pp. 409-430. But I'm more interested right now in another of Sarich's articles, published on the Web in 1994 and apparently not published elsewhere. This article also carries a good deal of abuse directed at the critics of Greenberg and Ruhlen: http://pubpages.unh.edu/~jel/sarich.html Here is the passage I'm interested in: "A similar scenario would also appear to apply in the linguistic realm, but to see it we first need to challenge the extremely conservative current consensus among most linguists that relationships among languages that diverged more than perhaps 7,000-8,000 years ago are, at present, unknowable. A simple exercise suffices here to show that this consensus is unreasonably pessimistic. One simply sits down with, for example, Buck's A Dictionary of Selected Synonyms in the Principal Indo-European Languages, a basic word list, and some independent knowledge of two or more languages representing distinct Indo-European groups. I used English and Croatian, representing, respectively, its Germanic and Slavic branches. If one then asks what proportion of the words in modern Croatian appear, simply by inspection (but allowing for some phonetic and semantic drift), to be cognate with the reconstructed Proto-Indo-European (PIE) form (or, where that is unavailable, the English word), one gets a minimum figure of about 60%. For example, snow, snjeg, *sneigwh; many, mnogo, *monogho; blood, krv, *kru; tree/wood, drvo, *dru; earth, zemlja, *ghem. Similar results were obtained using native speakers of Spanish and Bengali, and for Armenian and Albanian using Decsy's The Indo-European Protolanguage: a Computational Reconstruction. Thus 60% survival seems to be a reasonably representative figure for the survival of PIE roots with meanings in extant Indo-European languages. "Now obviously some number of these matches will be coincidental (though that number will likely be small, as illustrated by the fact that Chinese, by the same test, will show less than 10% apparent 'cognacy' with PIE, English, or Croatian -- I am indebted to Dr W S-Y Wang for this comparison), but, by the same token, some will be missed when the degree of phonetic or semantic change makes cognacy less than obvious. For example -- foot, noga, *ped -- where one might miss the English correspondence because of the phonetic changes, and would (and, perhaps, should) certainly miss the Croatian unless one remembered that 'pod' in Croatian means 'under', and that an association between 'under' and 'foot' is perfectly reasonable. This would imply a cognacy loss of less than 10% per millennium along a lineage, implying that even at a time depth of 12,000-14,000 years; that is, twice the probable time which separates modern Croatian from its Proto-Indo-European ancestor, one might retain 30% or so phonetic/semantic cognacy. Thus one could recognize relationships among languages whose common ancestor lay that far in the past provided that one looked at a sufficient number of them, and avoided simple binary comparisons. That is, if each of two descendant languages retains 30% cognacy with the ancestral language, they will, on average, share only 9% [(0.3)2] with one another -- and this gets into the chance area of similarity. On the other hand, if you look at 10 such languages, three, on the average, will retain a particular cognate -- greatly increasing your chances of recognizing relationships among them, and of reconstructing the ancestral form. This is the procedure and argument of Greenberg [(1987); see also discussion in Ruhlen (1987)], and, whatever the questions that might be raised about certain details, there can be no doubt the current general consensus among most linguists that relationships among languages older than about 7,000 years are, at present, unknowable, is unrealistically and unreasonably pessimistic and conservative." [END QUOTE] Now, many of these general issues have been much discussed elsewhere, and I have my own views, which I will reserve for the time being. But I am interested in hearing comments from colleagues on any part of this passage, though most particularly on the following points: *the use to which Sarich puts Buck's dictionary; *the claim that any given living IE language retains about 60% of the PIE lexicon in easily recognizable form; *the claim that genuine cognates among living IE languages are overwhelmingly obvious and trivial to identify by inspection alone; *the claim that this result automatically generalizes to other families, even to families which are as yet unrecognized. Please reply directly to me, since I have no wish to flood this list with discussions of long-ranger work. I'll post a summary when I can. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From DISTERH at UNIVSCVM.SC.EDU Fri Sep 29 12:17:40 2000 From: DISTERH at UNIVSCVM.SC.EDU (Dorothy Disterheft) Date: Fri, 29 Sep 2000 08:17:40 EDT Subject: Sarich and historical linguistics Message-ID: In a message dated 9/27/2000 6:36:24 AM, larryt at cogs.susx.ac.uk writes: <<... I want also to talk about the seemingly immense influence of the=20 long-rangers among our colleagues, who often appear to believe that the=20 long-rangers speak for historical linguistics. =20 ...I've become particularly interested in the writings of the eminent=20 molecular anthropologist Vincent Sarich, one of the founders of the=20 out-of-Africa hypothesis of human origins. Unlike most other non-linguists,=20 Sarich has stepped into historical linguistics in a big way -- and he doesn'= t=20 like us historical linguists very much.>> I hope Larry and everyone else will understand my posting this to the list.=20= =20 I think it's important to just add a few observations about Sarich that may=20 put his remarks in context. First of all, it should be remembered that Vincent Sarich has for a long tim= e=20 taken an advocacy position (and called himself an advocate) regarding certai= n=20 aspects of human genetics. He has been for example prominently involved in=20 the dialogue on race and IQ. And it should also be noted that the article=20 Prof Trask cites (http://pubpages.unh.edu/~jel/sarich.html), entitled "RACE=20 and LANGUAGE in PREHISTORY", is clearly a piece of "advocacy," which=20 obviously treats historical linguistics only as it relates to and serves to=20 advance Sarich's goals with regard to a somewhat larger argument. Sarich's position on Greenberg and longrangers is pretty much dictated by th= e=20 Out-of-Africa hypothesis and various other positions Sarich takes regarding=20 genetics and human culture. =20 What clear from the piece is that Sarich is trying to backdate language far=20 enough to make its diversity correlate with current human genetic diversity.= =20 Sarich advocates the view that modern human diversity, human intelligence an= d=20 cultures were born full-blown at some point after the Out-of-Africa event=20 some 100,000 years ago -- with relatively little convergence since. In the=20 piece, his argument with scientists claiming that language is a recent=20 development is expressly motivated by his position that languages matches up= =20 with racial genetics. Sarich is not really a lumper in the strict sense. And given all the above some caution might be called for in using Sarich as=20 representative of an academic non-linguist's views of historical linguistics= .=20 I suspect that if it better served his larger purposes, he would be citing=20 Lehmann and Trask. This isn't the first time of course that historical linguistics has been=20 called upon to support wider conclusions about human history. Sarich is=20 fairly unique however in viewing certain elements of it as supporting=20 conclusions that reach back some 30,000 years. It should be said that there are serious scientists who are not comfortable=20 with Sarich's understanding of the evidence of paleo-culture, much less of=20 his understanding of paleo-language. (And that's not to say that the geneti= c=20 implications of Out-of-Africa hasn't been challenged either.) Some of us think Sarich may be seriously underestimating paleo-humans and ho= w=20 long it took to develop something as sophisticated as human biology, human=20 culture and human language. On another web page, for example, one can find=20 an article by the formidable paleobiologist Henry Gee about the Sch=F6ningen= =20 spears. (http://quartz.ucdavis.edu/~GEL115/spears.html) To some, the=20 sophistication and possibly accumulative design of these 400,000 year old hu= n ting javelins suggests that they could not have been developed or redevelope= d=20 in a single generation. And accumulating and transmitting complex knowledge= =20 from one generation to the next suggests some form of transmission, perhaps=20 some form of language. Finally, I'd point out also that maybe it is the traditional assumption of=20 strict vertical descent in languages that makes any part of historical=20 linguistics attractive to Vincent Sarich and his "anti-convergenist"=20 monogenetic polemics. Those of us who think that there may be a relatively=20 high degree of convergence in linguistic history don't find commonalities=20 between languages extremely precise in illuminating prehistory or necessaril= y=20 indicative of some common noble biological ancestor. After all, the most=20 basic function of language is communication and that should move us all to=20 try to speak the same language, not different ones. And, of course, it's refreshing for us "convergenists" to see that the=20 primacy of vertical descent has recently taken a good drubbing in biology. =20 (See, e.g., Stephen Jay Gould's "Linnaeus's Luck" in Natural History,=20 September 2000). And some of us expect the same to eventually happen on a=20 different level to "Out-of-Africa". In the meantime, it might be suggested that Vincent M. Sarich's views are no= t=20 at all the best reflection of how informed non-linguists understand=20 historical linguistics. Steve Long From ratcliff at fs.tufs.ac.jp Fri Sep 1 13:16:51 2000 From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe) Date: Fri, 1 Sep 2000 09:16:51 EDT Subject: Q: the 'only six' argument Message-ID: > ----------------------------Original > message---------------------------- > Larry Trask wrote: > > > So, my question: does anybody believe that any version of this > > statement is valid? More precisely, do we have a number N and > > a set of criteria C such that the existence between two languages > > of N matches satisfying criteria C is enough to guarantee that > > the languages must be related? Wasn't this what Donald Ringe was trying to do? (The Factor of Chance in Language Comparison, Philadelphia 1992). But the statement phrased as you have it is certainly not valid. First no number of "matches" (sound correspondences?) can *guarantee* that the languages are related, only that the probability of their being related is high. Second "related" has to be understood as historically related rather than genetically related, because numerical criteria only help to decide the issue chance vs. non-chance similarity, not which type of historical contingency (descent from a common source or subsequent contact) may have produced the non-chance pattern. Third there is no absolute number valid in all cases, because it depends on the size and nature of the sample being compared. Specifically in the case of sound correspondences, the bigger the dictionary or word list the more chance correspondences can be expected; and the smaller the segment inventories of the languages compared the more chance correspondences can be expected. This is because the average expected number of chance occurences of an event (in this case a correspondence at a given position in a word) is the probablility of the event (in this case the relative frequency in the given position of the segments compared multiplied by each other) times the number of trials (in this case the number of semantically equivalent words available for comparison). So if you have two languages A and B, both of which have only ten consonants evenly distributed in word first position, and you have an A-B dictionary which has 10,000 entries correlating one word in A with one and only one semantic equivalent in B, with no synonyms in either langauge, you'd expect to find about 100 matches between any first consonant in A and any first consonant in B (chance that x will occur as first consonant in A: 1/10, multiplied by chance that y will occur as first consonant in B: 1/10, multiplied by total places where 1st C of a word in A can be compared with 1st C of word in B: 10,000). So you wouldn't be justified in suspecting a historical relationship till you got a good bit over a 100 matches. On the other hand if you had two languages with 25 consonants evenly distributed and a lexicon based on a1000 word random sample, you'd expect an average of only 1.6 first consonant matches (1/25 * 1/25 * 1000). So you'd be justified to suspect a non-accidental, hence historical relationship even with as few as 4 or 5 matches. Bobby D. Bryant wrote: > > > In short, I don't think such a formalization of the problem in terms > of N and > C is going to work in practice. At some level you are always going to > have > to pile on enough examples to convince your peers, which is of course > the way > things have always worked. > Piling on enough examples to convince your peers no longer works in practice, or else the long-distance comparison debates would not have become as acrimonious as they have. Formalizing the problem seems to me to be the only way forward. Besides, isn't that where the joy of research lies-- in ever sharpening and refining our understanding of our subject matter and of the tools we use to analyze it? -- ----------------------------------------------------------- Robert R. Ratcliffe Dept. of Linguistics and Information Science Tokyo University of Foreign Studies Asahi-machi 3-11-1, Fuchu-shi, Tokyo 183-8534 Japan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jer at cphling.dk Sun Sep 3 14:19:19 2000 From: jer at cphling.dk (Jens Elmegaard Rasmussen) Date: Sun, 3 Sep 2000 10:19:19 EDT Subject: Q: the 'only six' argument In-Reply-To: <39AE4FFF.1750AFC5@mail.utexas.edu> Message-ID: ----------------------------Original message---------------------------- Dear List, I have occasionally shocked my students by insisting that ONE *probative* example is enough to prove the point for which it is probative. The statement, of course, is tautological: If the examples did NOT prove its point, it would not be probative, for that's what the word probative means. The consequence is that, e.g., in Indo-European, certain disputed groupings MUST be accepted unless we are willing to swallow very awkward camels: If the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo- cannot be imagined to me parallel developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto- with other adjectives), and one cannot be assumed to have been borrowed from the other (would you borrow a new form of the superlative, if your language has a perfectly good one already?), then there WAS an Italo-Celtic node in the splitting-up of the IE unity. Similar arguments could be set up for some of the points uniting Baltic and Slavic which look strong enough in themselves to carry the burden of proof even if they were not supported by others. Nice to see the list blossoming again. Jens E. Rasmussen From hwhatting at hotmail.com Mon Sep 4 12:46:49 2000 From: hwhatting at hotmail.com (Hans-Werner Hatting) Date: Mon, 4 Sep 2000 08:46:49 EDT Subject: Q: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- On Sun, 3 Sep 2000 10:19:19 EDT, J. E. Rasmussen wrote: >I have occasionally shocked my students by insisting that ONE *probative* >example is enough to prove the point for which it is probative. The >statement, of course, is tautological: If the examples did NOT prove its >point, it would not be probative, for that's what the word probative >means. > The consequence is that, e.g., in Indo-European, certain disputed >groupings MUST be accepted unless we are willing to swallow very awkward >camels: If the Celtic superlative in *-isamo- and the Italic one in >*-is(s)amo- cannot be imagined to me parallel developments (from *-mHo- >[whence Ital./Celt *-amo-] with deictic vs. *-isto- with other >adjectives), and one cannot be assumed to have been borrowed from the >other (would you borrow a new form of the superlative, if your language >has a perfectly good one already?), then there WAS an Italo-Celtic node in >the splitting-up of the IE unity. Similar arguments could be set up for >some of the points uniting Baltic and Slavic which look strong enough in >themselves to carry the burden of proof even if they were not supported by >others. Languages don4t only borrow words or formations because they don4t have an adequate expression for a concept. Simply imitating a formation seen as more expressive or the usage of a language which is seen as more prestigious also plays a role. A good example from modern German is the borrowing of the English way of expressing the year in which an event happened. The traditional way in German is to say "Es geschah 1999.", but now quite often one can find "Es geschah in 1999.", which is a clear calque on English. The reason behind this is, of course, the big prestige of the English language, its far-spread knowledge, and also that this formation is more expressive than the traditional German one. A superlative formation seems to be a good candidate for borrowing on grounds of expressiveness. I don4t want to say that the superlative formation quoted cannot serve as proof for Italo-Celtic unity. But if there is only one example (in this case, of course, there are more than one, but the evidence is still inconclusive), one can never exclude borrowing. The only thing it proves is that the speakers of Proto-Celtic and Proto-Italic have been living close enough to borrow from one another. I would like to add the following to the general discussion: 1.) No quantity of matches can ever prove genetic relationship. One can probably find thousands of matches between, e.g., French and English or Latin and Albanian, without Albanian or English being Romance languages. 2.) There is, as far as I knoe, some sort of communis opinio on that certain matches (from basic vocabulary, grammatical morphemes) are more important for proving genetical relationship than others. 3.) I would recommend that if one has collected one4s matches, one should try a reconstruction. If the results are a decent basic vocabulary, and a basic common grammar, the languages examined are most probably genetically interrelated. There4s of course the question how to define "decent basic vocabulary" and "basic common grammar", and that4s (besides the questionableness of many matches) the main problem for wide-range reconstructions like Nostratic, Proto-World etc. Anyone interested in formulating some minimalist criteria? 4.) Always look at the history behind the matches. Are their historical links between the carriers of the respective languages, and of which kind are they? This is of course impossible if the history is not known, and if one wants to use language to reconstruct history. -- Essentially, I think a numerical approach does not take us very far. The most important question seems to me, can we reconstruct a system based on the matches, and what does it look like? If we get a basic grammar and basic vocabulary, there are strong reasons to suspect genetical relationship; if we get (say) a group of religious words, we can assume borrowing based on religious influences, and so on. Here, of course, numbers play a role - one simply needs a sufficient number of matches to constitute a system. But if we have to small a number of matches to form a convincing system, only historical evidence can help. Best regards, Hans-Werner Hatting, mag. phil. _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. Share information about yourself, create your own public profile at http://profiles.msn.com. From r.rankin at latrobe.edu.au Mon Sep 4 12:40:49 2000 From: r.rankin at latrobe.edu.au (R. Rankin) Date: Mon, 4 Sep 2000 08:40:49 EDT Subject: Q: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- Larry Trask wrote: > Quite often, in my reading, I've come across a statement of the > following type: > "The presence of only six good matches between two languages > is enough to show that the languages must be genetically related." > ... the number is always different. Six is the smallest I've ever seen, but > I've also seen 15, 50 and various other numbers. I don't recall often seeing such claims, but it is probable that I just disregarded them and read on. Personally, I'm very skeptical about the possibilty of developing any airtight criteria for genetic relationship that will work cross-linguistically. This sort of thing has to be done on a case-by-case basis. Factors may include various structural considerations (phonological, morphological and lexical), likelihood of creolization, likelihood of participation in a Sprachbund, etc. Meillet is said to have remarked that one could tell if a language were Indo-European or not just by examining the conjugation of the verb 'be' (though at my present location I cannot give you a citation). I tend to agree with his insistence on morphological criteria, but still think there are far too many potential variables for us to permit ourselves to be dogmatic. Noodling around with "universal criteria" is an enterprise for synchronists; we should not let ourselves be seduced into trying it in genetic linguistics. Bob Rankin -- Robert L. Rankin, Visiting Fellow Research Center for Linguistic Typology Institute for Advanced Study La Trobe University Bundoora, VIC 3083 Australia Office: (+61 03) 9467-8087 Home: (+61 03) 9499-2393 From degraff at MIT.EDU Tue Sep 5 09:57:18 2000 From: degraff at MIT.EDU (Michel DeGraff) Date: Tue, 5 Sep 2000 05:57:18 EDT Subject: Q: the 'only six' argument In-Reply-To: Your message of "Mon, 04 Sep 2000 08:46:49 EDT." Message-ID: ----------------------------Original message---------------------------- Holding humbly and tightly on my creolist-cum-syntactician hat, I would like to inquisitively and constructively piggy-back on Hans-Werner Hatting's observations and questions regarding (alleged) criteria for genetic relatedness. > 1.) No quantity of matches can ever prove genetic relationship. One can > probably find thousands of matches between, e.g., French and English or > Latin and Albanian, without Albanian or English being Romance languages. In a similar vein, note that the etymology of Haitian Creole---a (so called) "non-genetic" language---is overwhelmingly French while the lexicon of Modern English---a (so called) "genetic" language---is mostly non-Germanic etymologically. Besides, virtually all Haitian Creole affixes have cognates in French affixes whereas English has many affixes of non-Germanic origins. By the way, the latter observation about Haitian Creole suffices to falsify all these `classic' Creole-genesis scenarios that posit a affixless-pidgin phase a la Jespersen, Bickerton, McWhorter, Seuren, etc. > 2.) There is, as far as I knoe, some sort of communis opinio on that certain > matches (from basic vocabulary, grammatical morphemes) are more important > for proving genetical relationship than others. Virtually all of Haitian Creole's grammatical morphemes are etymologically French. > 3.) I would recommend that if one has collected one's matches, one should > try a reconstruction. If the results are a decent basic vocabulary, and a > basic common grammar, the languages examined are most probably genetically > interrelated. There's of course the question how to define "decent basic > vocabulary" and "basic common grammar", and that's (besides the > questionableness of many matches) the main problem for wide-range > reconstructions like Nostratic, Proto-World etc. Anyone interested in > formulating some minimalist criteria? Given what I've noted above vis-a-vis lexicon and morphology, it then seems that *absence* of "basic common grammar" would be *the* structural criterion for claiming that Creole languages such as Haitian Creole are "non-genetic" languages that arose via "abnormal transmission" whereas French, say, is a "genetic" language that arose via "normal transmission". Let me try and be more precise as to what I think are the implications of an hypothetic "basic common grammar" with respect to the genetic-vs-non-genetic hypothesis as it applies to, say, Haitian Creole vs. French. Whatever features define this "basic common grammar", these features must diverge when comparing the grammars of (colloquial) 17th-18th century French dialects to that of Haitian Creole, and such divergences must be *qualitatively* different than their counterparts in the ("genetic") course of French diachrony. So far, I have not be able to isolate such features. Whatever divergences exist between colloquial 17th-18th century French dialects and Haitian Creole (e.g., `loss' of verbal inflection, verb-placement differences, etc.) seem to have counterparts in the diachronic course of `genetic' languages. And what I find most intriguing is that such divergences in `genetic' diachrony also seem to coincide with the history of contact within these `genetic' diachronies. This was, of course, noted by Meillet, although he would most likely not agree with the conclusions I seem drawn to. In any case, if the "basic common grammar" remains elusive, then perhaps it's time to seriously (re-)challenge the alleged (non-)genetic dichotomy between Creole and non-Creole languages and/or the very concept of "genetic relatedness" as a linguistically (i.e., *structurally*) definable concept. Then again, I still need to learn more about the structural basis of genetic linguistics. This, I look forward to. -michel. ___________________________________________________________________________ MIT Linguistics & Philosophy, 77 Massachusetts Ave, Cambridge MA 02139-4307 degraff at MIT.EDU http://web.mit.edu/linguistics/www/degraff.home.html ___________________________________________________________________________ From larryt at cogs.susx.ac.uk Tue Sep 5 09:59:09 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Tue, 5 Sep 2000 05:59:09 EDT Subject: Sum: the 'only six' argument Message-ID: ----------------------------Original message---------------------------- I was planning to post a summary of the responses to my query last week about the 'only six' argument. However, after the first few respondents replied to me privately, the responses shifted to the list, and so all of you will now have seen most of the responses already. I will therefore content myself with reporting that no one who has so far replied has expressed any great sympathy with any version of the 'only six' argument, and several people have been openly hostile. These negative responses don't surprise me at all. I am certainly not sympathetic to the 'only six' argument. It's just that I keep coming across claims of this sort every now and again, and I was beginning to wonder if a significant number of historical linguists were embracing such arguments. Apparently not. Anyway, I hope we may continue the discussion on the list, so long as Dorothy is willing. My mail spool has been rather short of interesting historical discussions since the IE list suddenly collapsed last April. My thanks to everyone who has replied. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From larryt at cogs.susx.ac.uk Fri Sep 8 12:18:58 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Fri, 8 Sep 2000 08:18:58 EDT Subject: Q: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- This is an etymological question. English 'forest' is, of course, borrowed from Old French, where it goes back to Late Latin 'outer forest', with the first element possibly from 'outside'. I had always assumed that German 'forest' had the same origin. But, on checking, I find that things are more complicated. Some sources agree that the German word is of the same origin as the English one. But other authorities, including Kluge, give a quite different etymology. They derive from an unrecorded *, a derivative of Old High German 'fir tree' (modern ), with a semantic shift 'fir forest' > 'conifer forest' > 'forest'. Davis, in his English edition of Kluge, observes that opinion is divided on this etymology. Just to complicate things, Middle High German had a word 'forest', which even the proponents of Kluge's etymology seem to agree is derived from Latin and unrelated to modern . So, my question is this. Is there now general agreement on the etymology of ? Or is the question still up in the air? I ask because, if the Germanic etymology of is confirmed, then 'forest' and constitute one of the most wonderful chance resemblances I have ever seen -- right up there with English 'much' and Spanish 'much', and English 'bad' and Persian 'bad'. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From paoram at unipv.it Sat Sep 9 19:04:21 2000 From: paoram at unipv.it (Paolo Ramat) Date: Sat, 9 Sep 2000 15:04:21 EDT Subject: R: Q: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- -----Messaggio originale----- Da: Larry Trask +ADw-larryt+AEA-cogs.susx.ac.uk+AD4- A: HISTLING+AEA-VM.SC.EDU +ADw-HISTLING+AEA-VM.SC.EDU+AD4- Data: sabato 9 settembre 2000 1.55 Oggetto: Q: German Forst 'forest' +AD4-----------------------------Original message---------------------------- +AD4-This is an etymological question. +AD4- +AD4-English 'forest' is, of course, borrowed from Old French, +AD4-where it goes back to Late Latin +ADw-forestis (silva)+AD4- 'outer forest', +AD4-with the first element possibly from +ADw-foris+AD4- 'outside'. +AD4- +AD4-I had always assumed that German +ADw-Forst+AD4- 'forest' had the same +AD4-origin. But, on checking, I find that things are more complicated. +AD4- +AD4-Some sources agree that the German word is of the same origin +AD4-as the English one. But other authorities, including Kluge, +AD4-give a quite different etymology. They derive +ADw-Forst+AD4- from an +AD4-unrecorded +ACoAPA-forhist+AD4-, a derivative of Old High German +ADw-foraha+AD4- +AD4-'fir tree' (modern +ADw-F+APY-hre+AD4-), with a semantic shift 'fir forest' +AD4- +AD4-'conifer forest' +AD4- 'forest'. Davis, in his English edition of Kluge, +AD4-observes that opinion is divided on this etymology. +AD4- +AD4-Just to complicate things, Middle High German had a word +ADw-forest+AD4- +AD4-'forest', which even the proponents of Kluge's etymology seem to +AD4-agree is derived from Latin and unrelated to modern +ADw-Forst+AD4-. +AD4- +AD4-So, my question is this. Is there now general agreement on the +AD4-etymology of +ADw-Forst+AD4-? Or is the question still up in the air? +AD4- +AD4-I ask because, if the Germanic etymology of +ADw-Forst+AD4- is confirmed, +AD4-then 'forest' and +ADw-Forst+AD4- constitute one of the most wonderful +AD4-chance resemblances I have ever seen -- right up there with +AD4-English 'much' and Spanish +ADw-mucho+AD4- 'much', and English 'bad' and +AD4-Persian +ADw-bad+AD4- 'bad'. +AD4- +AD4- +AD4-Larry Trask +AD4-COGS +AD4-University of Sussex +AD4-Brighton BN1 9QH +AD4-UK +AD4- +AD4-larryt+AEA-cogs.susx.ac.uk +AD4- +AD4-Tel: 01273-678693 (from UK)+ADs- 1273-678693 (from abroad) +AD4-Fax: 01273-671320 (from UK)+ADs- 1273-671320 (from abroad) +ACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgA qACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKgAqACoAKg AqACoAKgAqACoAKg- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQA 9AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0APQ- Dear Larry, the etymology of Germ. +AF8-Forst+AF8- proposed also in the 23rd. ed. of Kluge's Etym. Wtb. der dt. Spr. (by E. Seebold, 1995) sounds rather unconvincing. From a Gmc. +ACoAXw-forhist+AF8- +ACI-Gehegtes+ACI- we should have MHG +AF8-foerhest+AF8- (with Umlaut) and NHG +ACoAXw-foerst+ACI-, just as we get +AF8-lengest+AF8-(+ADwAKgBf-langisto+AF8-), +AF8-ermest+AF8-(+ADwAKgBf-armisto+AF8-) etc., and NHG +AF8-laengst+AF8-, +AF8-aermst+AF8- Moreover, +AF8-Forst+AF8- seems not to be ProtoGmc.: it is attested in Germ. and Dutch (+AF8-vorst+AF8-) only . Thus I think you are right: the chance that we have here a loanword from Latin seems more plausible than the other hypothesis. Also De Vries, Nederl. etymol. Woordenb., says that OHG +AF8-forst+AF8- ( ca. 800) may derive from MLat. +AF8-forestis+AF8- , +ACI-reeds in 648 in een oorkonde voor Stavelot-Malm+AOk-dy+ACI-. Best, Paolo From ratcliff at fs.tufs.ac.jp Sat Sep 9 19:05:04 2000 From: ratcliff at fs.tufs.ac.jp (Robert R. Ratcliffe) Date: Sat, 9 Sep 2000 15:05:04 EDT Subject: Sum: the 'only six' argument Message-ID: Larry Trask wrote: > I will therefore content myself with reporting that no one > who has so far replied has expressed any great sympathy with > any version of the 'only six' argument, and several people > have been openly hostile. > > These negative responses don't surprise me at all. I am > certainly not sympathetic to the 'only six' argument. It's > just that I keep coming across claims of this sort every now > and again, and I was beginning to wonder if a significant > number of historical linguists were embracing such arguments. > Apparently not. Wait just a second there. I may have sounded negative myself. But when I thought about it a little more, I realized that there is a legitimate and interesting argument there, and it ought to be in historical linguistics textbooks if it isn't. ( I don't know if this is the argument you have seen, but I'd be interested to know if it IS any textbooks?). Basically, IF one has set up the question properly and IF one has carried out the comparison with discipline and honesty (big ifs, of course), then a very small number of examples of a single sound correspondence is sufficient to demonstrate a historical (not necessarily a genetic) relationship beyond any reasonable doubt. Practically speaking, given the sample sizes we usually work with and the way that phonological systems are set up, in most cases, the necessary number is indeed around six or not much more. This isn't anything for anyone to be hostile to (or sympathetic to, for that matter); it simply follows necessarily from the logic of probability. I'll explain, but first a clarification. When you ask about numerical criteria for a genetic relationship, you are asking (at least) two separate questions. Most of the respondents addressed the second question-- what are the criteria for determing if two historically related languages are related genetically-- as opposed to being related by contact or borrowing, or by being in a lexifier-creole relationship. Some respondents addressed the question of what criteria are relevant for subclassifying genetically related languages. As far as I can see (and as most of the respondents said), numerical criteria simply are not relevant for making these kinds of judgements. It's the nature of the similarities or commonalities, not the number of them that count. In any case probability theory doesn't come into play because in all these cases we have already ruled out coincidence as an explanation. But when approaching unclassified languages or languages which haven't been compared to each other before, the first question we have to ask is whether these languages have something in common which cannot be due to chance or coincidence. Numerical criteria and probability theory are the most reliable means for making judgements of this type. Here's how you end up with only six: First the average expected number of chance matches between any two consonants in any two languages (that is the expected number of times the consonants will appear in the same position in a word with the same meaning) is the frequency of the first consonant in its language times the frequency of the second consonant in its language times the number of word pairs available for comparison. Thus if ten percent of the words start with /t/ in one language and ten percent of the words in the other language start with /b/ then in a hundred word sample, there should be (by chance) one case where the translation of a word starting with /t/ in the first language starts with /b/ in the second. In a 1000 word sample there should be about ten such cases. One rough guide to frequency of a consonant is simply 1 over the number of consonants in the inventory. So if you have twenty consonants the average frequency of each consonant is 1/20 or .05. If you have a Macintosh with a graph calculator try entering this formula 1/x^2*n100 (one over x squared times n times 100). This gives you the expected number of correspondences, in a sample with n100 word pairs, of two languages both with x number of consonants, evenly distributed. You can see from this that as long as the average size of the consonant inventory is greater than 10 (or put another way, where no consonant occupies more than ten percent of the word positions being compared) the expected number of chance matches in a 100 wd sample is between 1 and 0. That is in a 100 word sample you expect that each consonant (in initial position) in one language will match up with each consonant in the other in one word or not at all. In a 1000 word sample the expected chance avgs. are not all that much higher-- basically if the average size of the consonant inventories is 14 (or the avg. frequency no more thant 1/14), you only expect to get 5 chance correspondences, though below 14 the expected number starts to climb dramatically. (At 5 the expected number is 40). The next question is how far above the average do we have to get before coincidence becomes an absurdly unlikely explanation. There is a formula for this, but I won't go through it since this post has gotten long. But here is one example: In the case where two langauges both have 20 consonants evenly distributed (or more realistically in comparing two consonants in two languages both of which have a frequency of 5% in the word-position being compared in their respective languages), the probabilty of finding more than 5 correspondences (i.e. 6 or more) in a 100 wd. sample is 0.000000356, or roughly 1 in 2.8 million. (The chance of finding 5 or more is roughly 1 in 163,000.) So in this set of circumstances "6 or more" (i.e a single correspondence set occuring in a given position-- say word-initially-- in 6 or more words) should be pretty well conclusive for demonstrating a non-chance and hence almost certainly historical (genetic or contact) relationship. I think that working all this out mathematically is interesting and important for compartive linguistics for two reasons. First it means that if you apply the comparison strictly (allow only one-to-one word comparisons, and one-to-one phoneme comparisons) you can get more knowledge from less information-- you can potentially demonstrate a relationship with much less data than comparativists have traditionally thought necessary. This is important to me, because I work in Afroasiatic, where the perpetual concern is exactly how to get more knowledge with less information (few old texts for most langauges). But the other side of this is that the mathematics makes it perfectly clear that if you relax the semantic and phonemic criteria far enough, you quickly come to a point where the expected number of chance correspondences becomes so high, that it becomes practically impossible to mount an effective demonstration of a relationship. The relevant parameters are number of comparisons and frequency of consonants. If you allow for comparison of each word with a wide range of semantically close words you multiply the number of comparisons and effectively increase the sample size. (A pair of 1000 wd-lists with one-to-one matching is the same mathematically as two 100 wd. lists with each word compared with 10 words in the other language-- both give 1000 pairs or trials). Going back to the previous example with frequency of 5% for each consonant the number of matches you need to get to the 1 in a million or better range for different samples sizes are: 200-8, 500-10, 1000-14, 2000-19. In other words although the average number of expected chance correspondences increases geometrically with sample size, the number needed for reasonable certainty of non-chance goes up at a higher rate. If you are considering each word in a 1000 wd list against 20 or 30 semantically close words, the effective sample size-- and hence the number of matches needed to demonstrate a non-chance relationship-- becomes gigantic. (I don't have a calculator powerful enough to calculate it though, sorry.) Similarly If you allow many-to-many phoneme matchings, you effectively increase the frequency. If you compare two systems of 15 consonants at 3 points of articulation one-to-one the chance of a match is on average 1/15 squared. The expected number of chance matches in a 1000 word sample is between 4 and 5 (4.44)-- reasonable. The chance of matching any two consonants at the same point of articulation is 1/3 squared. In a 1000 wd. sample the expected number of chance matches is 111-- a big jump. Thus with very loose criteria, the comparatist is in the paradoxical position of having to prove the existence of hundreds of "bad" (random) correspondences in order to have any confidence of having found in any good ones (ones which actually reflect language history). And if there really are any good correspondences, the problem of how to pick them out from all the random "noise" which is certain to be there is daunting. > -- ----------------------------------------------------------- Robert R. Ratcliffe Associate Professor, Arabic and Linguistics, Dept. of Linguistics and Information Science Tokyo University of Foreign Studies Asahi-machi 3-11-1, Fuchu-shi, Tokyo 183-8534 Japan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kroch at change.ling.upenn.edu Thu Sep 14 01:33:29 2000 From: kroch at change.ling.upenn.edu (Tony Kroch) Date: Wed, 13 Sep 2000 21:33:29 EDT Subject: Announcing the second edition of the Penn-Helsinki Parsed Corpus of Middle English Message-ID: ----------------------------Original message---------------------------- The second edition of the Penn-Helsinki Parsed Corpus of Middle English (PPCME2) is now publicly available under the conditions outlined below. It consists of 55 text samples containing 1.3 million words of syntactically annotated Middle English prose and ranging over four time periods, from 1150 to 1500. Like the first edition of the PPCME, the PPCME2 is based on the Middle English portion of the Helsinki Corpus of English Texts that was created at the University of Helsinki under the direction of Matti Rissanen and Ossi Ihalainen. The size of the text samples in the second edition has been enlarged so that the total corpus size is nearly three times larger. In addition, the corpus is now tagged for part of speech and the syntactic annotation system is richer. For the earliest time period, all texts except one are complete; the exception is the Ancrene Riwle sample, which contains approximately 50,000 words. For the later time periods, two texts per time period were expanded to approximately 50,000 words. The remaining texts are represented by the Helsinki Corpus sample. The PPCME2 is being distributed on a CD-ROM that includes several files for each text in the corpus: - a file with unannotated text - a file with philological and other information about the text (manuscript and edition used, date, dialect, genre, and word count of the sample) - a file in which individual words are tagged for part of speech - a file that is annotated for syntactic structure Available with the corpus is CorpusSearch, a Java program written by Beth Randall that runs under Unix, Linux, MacOS and Windows. CorpusSearch uses standard syntactic predicates like ``(immediately) precedes'', ``(immediately) dominates'', and Boolean combinations thereof, and it allows outputs of previous search as inputs to further searches. To order the PPCME2, please go to http://www.ling.upenn.edu/mideng and follow the instructions there. The cost of a subscription to the corpus is $200 and the cost of a license for CorpusSearch is $50. The items may be purchased together or separately. Proceeds from the sale of the corpus will pay for improving the corpus and for increasing its size over time. Proceeds from the sale of CorpusSearch will go to the author. The PPCME2 was designed and built by Anthony Kroch and Ann Taylor at the University of Pennsylvania. Supplementary assistance was provided by Beatrice Santorini. The PPCME2 is part of of a larger project to produce a parsed diachronic corpus of English from 800 to 1800. The Old English part is under construction at York under the direction of Anthony Warner, Susan Pintzuk, and Ann Taylor and the Early Modern English part is under construction at the University of Pennsylvania under the direction of Kroch and Santorini. From Ann.Kumar at anu.edu.au Thu Sep 14 11:01:01 2000 From: Ann.Kumar at anu.edu.au (Ann Kumar) Date: Thu, 14 Sep 2000 07:01:01 EDT Subject: "only six" argument Message-ID: We have been following the HISTLING discussion initiated by Larry Trask with interest, because we have been involved over the last two years in a particular case that had to solve the problem of the amount of data that is necessary to establish relatedness. (Not genetic, but via borrowing). We have been doing what Robert R. Ratcliffe takes as his starting point in his last e-mail, i.e. "approaching unclassified languages or languages which haven't been compared before [where] the first question we have to ask is whether these languages have something in common which cannot be due to chance or coincidence." The results will be published in the December issue of Oceanic Linguistics, but we thought it might interest LIST members to have a sneak preview, at least of the (rahter long) section on probability, where we discuss relevant issues. (The section is attached.) We were trying to find out whether some semantic and phonological matches in Old Japanese and Old Javanese lexis were too extensive to be due to chance. In this particular case, rather than looking at single sound correspondences, we used whole-word comparison, and of longer words (CVCVC structure) with recurrent sound correspondences. While it is not possible to go into the calculations here, it turned out that in this case only one match between words of this length could be expected to occur by chance. In the section on probability Rose discusses the usefulness of the approaches taken earlier by Nichols and Ringe and goes on to propose that a Bayesian, rather than frequentist, statistical approach should be the preferred option. We have attached this section. We agree with Ratcliffe that "Numerical criteria and probability theory are the most reliable means for making judgements of this type". But we are able to demonstrate a few more things that might interest LIST readers, (and can also offer some real data!). As mentioned, we also have some points to make concerning the appropriateness of the frequentist (as opposed to a Bayesian) paradigm for evaluating questions of this kind (i.e assessing the probability of a hypothesis). (Bayesian formulations are used, for example, in forensics. We don't know to what extent historical linguistics are aware of them, so we offer them in case people are interested.) Ann Kumar Phil Rose -------------- next part -------------- A non-text attachment was scrubbed... Name: short_prob.doc Type: application/mac-binhex40 Size: 71221 bytes Desc: not available URL: -------------- next part -------------- =========================================================================== Dr Ann Kumar Vice-President, Australian Academy of the Humanities Centre for the Study of Asian Societies and Histories Faculty of Asian Studies Canberra ACT 0200 Australia Tel. (02) 6249 3677/4658 fax. (02) 6279-8326 From X99Lynx at aol.com Mon Sep 25 14:55:08 2000 From: X99Lynx at aol.com (Steve Long) Date: Mon, 25 Sep 2000 10:55:08 EDT Subject: Superlative Forms and Swallowing Camels Message-ID: ----------------------------Original message---------------------------- On Sun, 3 Sep 2000 10:19:19 EDT. jer at cphling.dk wrote: <> (Hi, Jens!) I must ask of course how we know that one language or the other already had "a perfectly good form of the superlative?" With all due respect to the writer, from whom I've already learned a great deal, I must ask whether the case is as clear cut as he perceives it. This camel might be the kind you find in animal cracker boxes -- bite-sized. Ironically, two relevant languages make no morphological distinction between the comparative and the superlative - Manx and French. If this says nothing else, it proves that languages can find themselves without any form of the superlative, much less "a perfectly good one." Whatever forces caused the loss of the superlative in those languages may have caused an earlier loss in either Celtic or Italic. And that would have meant one or the other of those two languages may have been in need of a superlative form and therefore had a very good reason to borrow it. And doesn't the question <> work both ways? Why would "Italo-Celtic" innovate a superlative form when they already had a perfectly good one? In my mind this raises again the question of how one distinguishes between a borrowing and descent from a common ancestor, IF the word or form is actually old enough to predate indicia of borrowing. Also, there are those of us who suspect that going back 4000+ years creates a great deal of uncertainty about what languages -- both IE and non IE -- the form could have been borrowed from. The reconstruction the author offers -- "the Celtic superlative in *-isamo- and the Italic one in *-is(s)amo- cannot be imagined to be parallel developments (from *-mHo- [whence Ital./Celt *-amo-] with deictic vs. *-isto with other adjectives)" -- does not foreclose the possibility that development is one that occurred in some third language (or the specialized dialect of an influential, itinerant linguistic community -- like scribes or priests) and that both Latin and Celtic "borrowed" it independently. And finally, why would a language borrow a word like "superlative" when presumably back in the days of Old English, it "already had a perfectly good one?" Regards, Steve Long From larryt at cogs.susx.ac.uk Tue Sep 26 14:35:03 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Tue, 26 Sep 2000 10:35:03 EDT Subject: Sum: German Forst 'forest' Message-ID: ----------------------------Original message---------------------------- Some days ago I posted a query about the disputed etymology of German 'forest'. I got only three replies, but those were interesting. The query was whether German derives, like English 'forest', from a late Latin word, or whether it is a native word derived ultimately from the German word for 'fir tree'. Two of the respondents were skeptical of the German etymology. One of them suggested it might be a residue of the unfortunate Romantic tendency to seek "Germanic" etymologies for loans from Latin. The third, however, was much more enthusiastic about the Germanic etymology, and noted that the derivation of late Latin from 'outside' is far from secure, and that a loan from Germanic has been suggested. Well, turnabout is fair play, I guess. Anyway, it appears that I cannot yet add 'forest' and to my little collection of striking chance resemblances. But one of my respondents (SG) sent in a couple of lovely examples of chance resemblances: German /Scheune/ "shack" : Coptic /shoine/ id. German /Schuh/ "shoe" : Itelmen /sxu/ (works even better with Dutch) aso. (Itelmen is a Chukcho-Kamchatkan language of eastern Siberia.) My thanks to David Fertig, Stefan Georg, and Paolo Ramat. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From larryt at cogs.susx.ac.uk Wed Sep 27 11:35:18 2000 From: larryt at cogs.susx.ac.uk (Larry Trask) Date: Wed, 27 Sep 2000 07:35:18 EDT Subject: Q: Sarich and historical linguistics Message-ID: ----------------------------Original message---------------------------- In a few weeks, I'm giving a talk on the perception of language and linguistics among our academic colleagues in other disciplines, such as psychology, anthropology, archaeology, primatology and genetics. Most of this talk will deal with non-historical matters, but I want also to talk about the seemingly immense influence of the long-rangers among our colleagues, who often appear to believe that the long-rangers speak for historical linguistics. See, for example, the writings of the geneticist Robert Sokal, of the palaeoanthropologist Richard Klein, and of the primatologist Robin Dunbar. But I've become particularly interested in the writings of the eminent molecular anthropologist Vincent Sarich, one of the founders of the out-of-Africa hypothesis of human origins. Unlike most other non-linguists, Sarich has stepped into historical linguistics in a big way -- and he doesn't like us historical linguists very much. In a 1994 article, he warmly defends the long-rangers, and he hurls abuse at those linguists who have criticized their work, accusing the critics of being anti-scientific and of acting from the basest motives: Vincent M. Sarich (1994), 'Occam's razor and historical linguistics', in M. Y. Chen and O. J. L. Tzeng (eds), In Honor of William S.-Y. Wang, Pyramid Press, pp. 409-430. But I'm more interested right now in another of Sarich's articles, published on the Web in 1994 and apparently not published elsewhere. This article also carries a good deal of abuse directed at the critics of Greenberg and Ruhlen: http://pubpages.unh.edu/~jel/sarich.html Here is the passage I'm interested in: "A similar scenario would also appear to apply in the linguistic realm, but to see it we first need to challenge the extremely conservative current consensus among most linguists that relationships among languages that diverged more than perhaps 7,000-8,000 years ago are, at present, unknowable. A simple exercise suffices here to show that this consensus is unreasonably pessimistic. One simply sits down with, for example, Buck's A Dictionary of Selected Synonyms in the Principal Indo-European Languages, a basic word list, and some independent knowledge of two or more languages representing distinct Indo-European groups. I used English and Croatian, representing, respectively, its Germanic and Slavic branches. If one then asks what proportion of the words in modern Croatian appear, simply by inspection (but allowing for some phonetic and semantic drift), to be cognate with the reconstructed Proto-Indo-European (PIE) form (or, where that is unavailable, the English word), one gets a minimum figure of about 60%. For example, snow, snjeg, *sneigwh; many, mnogo, *monogho; blood, krv, *kru; tree/wood, drvo, *dru; earth, zemlja, *ghem. Similar results were obtained using native speakers of Spanish and Bengali, and for Armenian and Albanian using Decsy's The Indo-European Protolanguage: a Computational Reconstruction. Thus 60% survival seems to be a reasonably representative figure for the survival of PIE roots with meanings in extant Indo-European languages. "Now obviously some number of these matches will be coincidental (though that number will likely be small, as illustrated by the fact that Chinese, by the same test, will show less than 10% apparent 'cognacy' with PIE, English, or Croatian -- I am indebted to Dr W S-Y Wang for this comparison), but, by the same token, some will be missed when the degree of phonetic or semantic change makes cognacy less than obvious. For example -- foot, noga, *ped -- where one might miss the English correspondence because of the phonetic changes, and would (and, perhaps, should) certainly miss the Croatian unless one remembered that 'pod' in Croatian means 'under', and that an association between 'under' and 'foot' is perfectly reasonable. This would imply a cognacy loss of less than 10% per millennium along a lineage, implying that even at a time depth of 12,000-14,000 years; that is, twice the probable time which separates modern Croatian from its Proto-Indo-European ancestor, one might retain 30% or so phonetic/semantic cognacy. Thus one could recognize relationships among languages whose common ancestor lay that far in the past provided that one looked at a sufficient number of them, and avoided simple binary comparisons. That is, if each of two descendant languages retains 30% cognacy with the ancestral language, they will, on average, share only 9% [(0.3)2] with one another -- and this gets into the chance area of similarity. On the other hand, if you look at 10 such languages, three, on the average, will retain a particular cognate -- greatly increasing your chances of recognizing relationships among them, and of reconstructing the ancestral form. This is the procedure and argument of Greenberg [(1987); see also discussion in Ruhlen (1987)], and, whatever the questions that might be raised about certain details, there can be no doubt the current general consensus among most linguists that relationships among languages older than about 7,000 years are, at present, unknowable, is unrealistically and unreasonably pessimistic and conservative." [END QUOTE] Now, many of these general issues have been much discussed elsewhere, and I have my own views, which I will reserve for the time being. But I am interested in hearing comments from colleagues on any part of this passage, though most particularly on the following points: *the use to which Sarich puts Buck's dictionary; *the claim that any given living IE language retains about 60% of the PIE lexicon in easily recognizable form; *the claim that genuine cognates among living IE languages are overwhelmingly obvious and trivial to identify by inspection alone; *the claim that this result automatically generalizes to other families, even to families which are as yet unrecognized. Please reply directly to me, since I have no wish to flood this list with discussions of long-ranger work. I'll post a summary when I can. Larry Trask COGS University of Sussex Brighton BN1 9QH UK larryt at cogs.susx.ac.uk Tel: 01273-678693 (from UK); +44-1273-678693 (from abroad) Fax: 01273-671320 (from UK); +44-1273-671320 (from abroad) From DISTERH at UNIVSCVM.SC.EDU Fri Sep 29 12:17:40 2000 From: DISTERH at UNIVSCVM.SC.EDU (Dorothy Disterheft) Date: Fri, 29 Sep 2000 08:17:40 EDT Subject: Sarich and historical linguistics Message-ID: In a message dated 9/27/2000 6:36:24 AM, larryt at cogs.susx.ac.uk writes: <<... I want also to talk about the seemingly immense influence of the=20 long-rangers among our colleagues, who often appear to believe that the=20 long-rangers speak for historical linguistics. =20 ...I've become particularly interested in the writings of the eminent=20 molecular anthropologist Vincent Sarich, one of the founders of the=20 out-of-Africa hypothesis of human origins. Unlike most other non-linguists,=20 Sarich has stepped into historical linguistics in a big way -- and he doesn'= t=20 like us historical linguists very much.>> I hope Larry and everyone else will understand my posting this to the list.=20= =20 I think it's important to just add a few observations about Sarich that may=20 put his remarks in context. First of all, it should be remembered that Vincent Sarich has for a long tim= e=20 taken an advocacy position (and called himself an advocate) regarding certai= n=20 aspects of human genetics. He has been for example prominently involved in=20 the dialogue on race and IQ. And it should also be noted that the article=20 Prof Trask cites (http://pubpages.unh.edu/~jel/sarich.html), entitled "RACE=20 and LANGUAGE in PREHISTORY", is clearly a piece of "advocacy," which=20 obviously treats historical linguistics only as it relates to and serves to=20 advance Sarich's goals with regard to a somewhat larger argument. Sarich's position on Greenberg and longrangers is pretty much dictated by th= e=20 Out-of-Africa hypothesis and various other positions Sarich takes regarding=20 genetics and human culture. =20 What clear from the piece is that Sarich is trying to backdate language far=20 enough to make its diversity correlate with current human genetic diversity.= =20 Sarich advocates the view that modern human diversity, human intelligence an= d=20 cultures were born full-blown at some point after the Out-of-Africa event=20 some 100,000 years ago -- with relatively little convergence since. In the=20 piece, his argument with scientists claiming that language is a recent=20 development is expressly motivated by his position that languages matches up= =20 with racial genetics. Sarich is not really a lumper in the strict sense. And given all the above some caution might be called for in using Sarich as=20 representative of an academic non-linguist's views of historical linguistics= .=20 I suspect that if it better served his larger purposes, he would be citing=20 Lehmann and Trask. This isn't the first time of course that historical linguistics has been=20 called upon to support wider conclusions about human history. Sarich is=20 fairly unique however in viewing certain elements of it as supporting=20 conclusions that reach back some 30,000 years. It should be said that there are serious scientists who are not comfortable=20 with Sarich's understanding of the evidence of paleo-culture, much less of=20 his understanding of paleo-language. (And that's not to say that the geneti= c=20 implications of Out-of-Africa hasn't been challenged either.) Some of us think Sarich may be seriously underestimating paleo-humans and ho= w=20 long it took to develop something as sophisticated as human biology, human=20 culture and human language. On another web page, for example, one can find=20 an article by the formidable paleobiologist Henry Gee about the Sch=F6ningen= =20 spears. (http://quartz.ucdavis.edu/~GEL115/spears.html) To some, the=20 sophistication and possibly accumulative design of these 400,000 year old hu= n ting javelins suggests that they could not have been developed or redevelope= d=20 in a single generation. And accumulating and transmitting complex knowledge= =20 from one generation to the next suggests some form of transmission, perhaps=20 some form of language. Finally, I'd point out also that maybe it is the traditional assumption of=20 strict vertical descent in languages that makes any part of historical=20 linguistics attractive to Vincent Sarich and his "anti-convergenist"=20 monogenetic polemics. Those of us who think that there may be a relatively=20 high degree of convergence in linguistic history don't find commonalities=20 between languages extremely precise in illuminating prehistory or necessaril= y=20 indicative of some common noble biological ancestor. After all, the most=20 basic function of language is communication and that should move us all to=20 try to speak the same language, not different ones. And, of course, it's refreshing for us "convergenists" to see that the=20 primacy of vertical descent has recently taken a good drubbing in biology. =20 (See, e.g., Stephen Jay Gould's "Linnaeus's Luck" in Natural History,=20 September 2000). And some of us expect the same to eventually happen on a=20 different level to "Out-of-Africa". In the meantime, it might be suggested that Vincent M. Sarich's views are no= t=20 at all the best reflection of how informed non-linguists understand=20 historical linguistics. Steve Long