From hubeyh at mail.montclair.edu Thu Jan 2 02:56:05 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 21:56:05 -0500 Subject: [language] S.Giannini - S.Scaglione - Abstract Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://www.humnet.unipi.it/~medtyp/gian.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Thu Jan 2 02:59:01 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 21:59:01 -0500 Subject: [language] http://www.ai.uga.edu/~mc/#HIST Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Computers in historical linguistics (NEW) An algorithm to align words for historical comparison (PDF) (PostScript) This is the first step in a computer implementation of the Comparative Method. A revised version of this paper has been published in Computational Linguistics. (NEW) Aligning multiple languages for historical comparison (PDF) Extension of the above to more than 2 languages at a time. Presented at COLING-ACL '98. (NEW) The number of distinct alignments of two strings (PDF) (PostScript) Unfinished draft of a paper co-authored with E. R. Canfield. ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From hubeyh at mail.montclair.edu Thu Jan 2 03:02:07 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 22:02:07 -0500 Subject: [language] quant linguistics Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://www.uta.edu/english/tim/courses/4301f98/linglinx/gen.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Thu Jan 2 17:20:50 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Thu, 2 Jan 2003 12:20:50 -0500 Subject: [language] The Significance of Word Lists Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://csli-publications.stanford.edu/site/1575863006.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Wed Jan 8 15:12:53 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Wed, 8 Jan 2003 10:12:53 -0500 Subject: [language] [Fwd: Did Early Humans Mate With The Locals? Human Genome Data Cast Doubt On "Replacement Theory" Of Human Evolution] Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Source: University Of Utah Date: 2002-12-26 Did Early Humans Mate With The Locals? Human Genome Data Cast Doubt On "Replacement Theory" Of Human Evolution A new analysis of human genetic history deals a blow to the theory that early people moved out of Africa and completely replaced local populations elsewhere in the world. The findings suggest there was at least limited interbreeding between our African ancestors and the residents of areas where they settled. "The new data seem to suggest that early human pioneers moving out of Africa starting 80,000 years ago did not completely replace local populations in the rest of the world," says Henry Harpending, a University of Utah anthropology professor and co-author of the new study. "There is instead some sign of interbreeding." If that conclusion is correct, it contradicts the "replacement theory" of human evolution - a theory Harpending has advocated for more than a decade. "Hypotheses are called into question by data every day in science. That's the way it works," he says. The journal Proceedings of the National Academy of Sciences is publishing the new findings in its online edition the week of Dec. 23, 2002. The study's 20 co-authors include three from the University of Utah: Harpending; Alan Rogers, also a professor of anthropology; and Stephen Wooding, a postdoctoral researcher in human genetics. The study was led by anthropologist Stephen Sherry and mathematician Gabor Marth of the National Center for Biotechnology Information at the National Institutes of Health in Bethesda, Md. Sherry is a former student of Harpending's when both were at Pennsylvania State University. Other co-authors of the new study are from the Washington University School of Medicine in St. Louis, The Johns Hopkins University School of Medicine in Baltimore and the University of California, San Francisco. Most anthropologists agree human ancestors first spread out of Africa roughly 1.8 million years ago, establishing new populations in Europe, Asia and elsewhere. The "multiregional theory" holds modern humans evolved from those multiple populations. The competing "replacement theory" says that the local populations, including Europe's Neanderthals, went extinct when they were replaced roughly between 80,000 and 30,000 years ago by another wave of human immigrants from Africa. Scientists can analyze ancient genetic mutations in modern people to learn about how humans evolved and the size of the human population over time. Mutations occur at a relatively steady rate over time. If the human population were large at a specific point in prehistoric time, more mutations would occur, resulting in greater diversity in genetic mutations found in modern people. A small population of human ancestors would result in fewer mutations, so modern humans would display less genetic diversity. So a person's genetic material "contains the whole history of the population from which you descended," Harpending says. Earlier studies of genetic material known as mitochondrial DNA and microsatellites supported the notion that a small group of perhaps 5,000 to 20,000 primitive humans migrated from East Africa, spread around the world, a rapidly expanded in population as they replaced other human populations elsewhere in Africa 80,000 years ago, and in Asia 50,000 years ago and Europe about 35,000 years ago. The new study, however, analyzed mutations called SNPs (single nucleotide polymorphisms) in DNA from the nucleus of human cells studied for the Human Genome Project, the effort to map the entire human genetic blueprint. The analysis indicates there was a bottleneck in the human population - what looks like a sharp reduction in the number of people - when ancestors of modern humans colonized Europe roughly 40,000 years ago. Researchers are not sure what this means because it conflicts with studies of other kinds of human genetic information, which support the idea that a rapidly expanding African population spread globally and replaced local populations elsewhere. "If Africans moved out of Africa and then populated the whole world, we would see that in the genetic evidence as an expansion in population size," yet the new study indicated the population shrank instead, Rogers says. The evidence five years ago indicated migrating Africans did not interbreed with local populations, while the new study indicates they did, Rogers notes, adding that the conflicting genetic data mean "the question is still open." Harpending says one possible explanation for the new data is that there was a large population of humans who migrated from Africa, yet they kept largely to themselves and mated only to a limited extent with local populations in Europe and elsewhere. Because interbreeding still was uncommon, only a few of the prehistoric European genes were incorporated into the modern human genetic blueprint, giving a false impression that the prehistoric human population collapsed or shrank in size, Harpending says. Another possibility is that the prehistoric African population was large 100,000 years ago, but only a very small number - perhaps a few dozen - of those Africans migrated to other areas some 80,000 years ago, ultimately replacing local populations. That would explain why the human genetic blueprint could give a false impression that the human population collapsed in size even if it did not. But Harpending believes it is unlikely that such a small number of migrants from Africa could spread globally and ultimately replace other populations. Editor's Note: The original news release can be found here. http://www.utah.edu/unews/releases/02/dec/genome.html ------------------------------------------------------------------------------- Note: This story has been adapted from a news release issued for journalists and other members of the public. If you wish to quote any part of this story, please credit University Of Utah as the original source. You may also wish to include the following link in any citation: http://www.sciencedaily.com/releases/2002/12/021226071610.htm News in Brain and Behavioural Sciences - Issue 81 - 14th December, 2002 http://human-nature.com/nibbs/issue81.html M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Sun Jan 19 05:27:02 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 00:27:02 -0500 Subject: [language] Chuvash initial-l, and Common Turkic initial-t Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> I have a set of cognates from the two Turkic families. I got the Chuvash words from Paasonen, but I could only get the Turkish translation, so I had to translate them back into English. I am sure they are cognates. But the only time I have seen t=l is in Hittite tabarna=labarna. More strange parallels follow. I have looked at Watkins and Buck, and Bomhard's Nostratic books. I am still not sure about certain things. Here they are: 1.) Chuvash lapatka bileyi tahtas?; kenarlar?na katran sürülmüs,, sapl? bir tahtad?r ki üzerindeki t?rpanlar bilenir; çamas,?r tokmag(? [rusças?:???????] [Paasonen08:82]; This is a board for sharpening scythes. It is also used for beating (washing) clothes. Turkic tapla (to sharpen); found at least in Karachay-Balkar 2) Chuvash lapa(rdat suda çalkalanmak[Paasonen08:82]; To be rinsed in water (that is what is says) Turkic shIpIrdat to make gurgling noises 3.a) Chuvash lapa(rkka çamur (yollarda); kir, pislik, deg(ersiz, is,e yaramaz[Paasonen08:82]; (mud, dirt, worthless) Turkic toprak dirt, earthen The -ge suffix is an ancient Turkic suffix (Clauson) . 3.b) Chuvash lapra kir, çamur[Paasonen08:83]; (dirt, mud) It looks like Chuvash retains both versions. It looks like the word for "earth" in Turkic should be more like *torpang so that the original root *tor becomes cognate with Latin terra, Sumerian tir, Turkic toz (dust), Turkic tuz (salt), Turkic turi (sour), etc. Common Turkic for "land, earth" is cer/yet, but the initial *c/y are reconstructed as deriving from *d (Doerfer) thus der also fits into the same scheme. This word has been found in runic inscriptions in the North Caucasus (Tavkul...) 4) Chuvash lapa(stat gürültü ile vurmak, kakmak[Paasonen08:82]; (to hit, push, shove noisily) Turkic tep to kick, hit, move; It shows up in various words tekme (<*tepme) a kick, tebre (tepre) to move, to kick, etc 5) Chuvash lac(ag^a çamur[Paasonen08:82]; Shows up in Turkish as lachka e.g a borrowing evidently 6) Chuvash la?an çamas,?r teknesi, leg(en [Kaz. la?an] [Paasonen08:82]; tok, tegen, ogen (a trough for washing clothes) It shows up in Turkic as legen, another borrowing. It shows up also in words like tegene (a tub, basin) tekne (trough). It seems to be related to to"k (to pour). 7) Chuvash laja at [Kaz. alas,a] [Paasonen08:82]; (horse. Shows up in Kruger's book as loshad) Turkic languages (along with some Uralic languages have it as alasha. The word seems to derive from en/in (to go down), which also shows up in Turkish as o"n (front), e.g. o"nde (in front of ). The words al/il show up as "front" (alda, ileri, etc) but as "low" as in alasha, and Turkish alchak. I derived these and related them to "donkey", ass, anshe, etc. once on sci.lang. e.g. alasha, alashang, onos, onager, donkey > doneg. http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3DBF1AB4.7070205%40mail.montclair.edu&prev=/groups%3Fq%3Dalashang%2Bgroup:sci.lang.*%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D3DBF1AB4.7070205%2540mail.montclair.edu%26rnum%3D1 8a) Chuvash lap ova, düz yer [Bas,k. K. lapak alçak] [Paasonen08:83]; (plain, flat ground) But in K Bashk apparently lapak means "low". But in Turkic the t-words are tobbe (top) tepe (hill) tu"b (bottom) to"ben (lower) 8b) Chuvash le(pke bas,?n tepesi [Paasonen08:85]; (the top of the head) see others above 9) Chuvash lapka sulu kar[Paasonen08:83]; (wet snow) 10) Chuvash lar oturmak [Soy. Uyg. R. olur, Yak. olor, Kom. Krm. Kumd. Uyg. R. oltur, Kaz ut?r] [Paasonen08:83]; (to sit, to dwell?) Turkic tur to stay, to remain, to stand, to dwell Turkic tu"sh to fall down, go down, sit 11) Chuvash lavkka dükan[Paasonen08:83]; (store) This means that Turkic dukkan, KBal tuken are directly from protoTurkic and lavka is not Russian. 12) Chuvash lek as?l? durmak, bulunmak; çatmak, isabet etmek [Kaz. lek, ?lek] [Paasonen08:84]; (to remain hanging, to be found, to hit) Turkic as to hang e.g. from a wall or something Turkic il to hang onto, to connect to something e.g. ilik button-hole, ilin to latch onto something Turkic tak to hang (transitive) This last one is constructed like yak (to burn, trans); vs yan (to burn, intrans) from *ya Other verbs formed similarly: yIk (to knock down, pull down), bak (to look), to k (to pour, see above) Strangely enough, Hittite lak (to burn), and Hittite luk (to knock down) (Are the last two Hittite words from IE roots?) 13) Chuvash lere orada[Paasonen08:84]; (there, at there) Turkic deri/teri "up to [there]" KBal alayda "there" It is very strange because I constructed protoTurkic *th so that *th>l and *th>t to explain these but in addition to Hittite l-words now we get English *th words, e.g. lere=there 14) Chuvash les' iletmek, götürmek, tas,?mak [Kaz. ilt, it; Kaz. Tob. Ba(r. R. ilt; Uyg. elt; Yak ilt] [Paasonen08:84]; (to forward, to carry) Turkic tashI to carry Hittite tarna to carry! Another strange parallel. Turkish tash (to overflow); Turkic tIsh (outside) e.g. overflow=go outside the bucket Connected with the scapegoat from Hittite. 15) Chuvash la(g^a kakmak, tak?rt? etmek[Paasonen08:84]; (to hit, shove, make noises e.g. takIrtI) Turkic toku to beat up Turkic tu"y, do"v, tu"g to beat up Turkic tayak stick, 16) Chuvash la(plan dinmek (rüzgar hakk?nda); yavas,lamak[Paasonen08:84]; KBal tIn to calm down, to be quiet, calm 17) Chuvash la(rg^a m?r?ldan, m?zm?z[Paasonen08:85]; (to complain, ) KBal tarIghIrgha < * targa? to complain (Irgha is the infinitive case) 18) Chuvash la(sta(rdat titremek, sallanmak[Paasonen08:85]; (to shake, to swing) Turkish titre to shake, to tremble KBal tentire to tremble 19) Chuvash la(ymag^a salya [Paasonen08:85]; (spittle, saliva, sputum) 20) Chuvash le(g^e( kepek, tortu[Paasonen08:85]; 21) Chuvash le(p ?l?k[Paasonen08:85]; (warm, tepid) Turkic tamId to blaze up and many other such words about heat, warmth, iron-working, etc. including words with the shift t>k e.g. kabIn (to catch fire) from apparently something like *tab. It looks like *th>l and *th > t. Perhaps those with knowledge of the present state of *PIE and *PAA can give me hints as to the most likely phonetic realization of the *th. -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 00:23:15 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 19:23:15 -0500 Subject: [language] Datamining, Statistics and Linguistics Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This excerpt is from a brand new and very influential book. The Preface explains why this book was necessary. -------------------begin here------------------------ The field of statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded in both size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of "datamining"; statistical and computational problems in biology and medicine have created "bioinformatics". Vast amounts data are being generated in many fields, and the statistician's job is to make sense of it all: extract important patterns and trends, and understand "what the data says." We call this learning from data. The challenges from data have led to a revolution in the statistical sciences. Since computation plays such a key role, it is not surprising that much of this new development has been done by researchers in other fields such as computer science and engineering. The learning problems that we consider can be roughly categorized as either supervised or unsupervised. In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures. This book is our attempt to bring together many of the new ideas in learning, and explain them in a statistical framework. While some mathematical details are needed, we emphasize the methods and their conceptual underpinnings rather than their theoretical properties. As a result, we hope that this book will appeal not just to statisticians but also to researchers and practitioners in a wide variety of fields. Just as we have learned a great deal from researchers outside the field of statistics, our statistical viewpoint may help others to better understand different aspects of learning: There is no true interpretation of anything; interpretation is a vehicle in the service of human comprehension. The value of interpretation is in enabling others to fruitfully think about an idea. Andreas Buja We would like to acknowledge....... Trevor Hastie Robert Tibshirani Jerome Friedman May 2001 -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 00:59:28 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 19:59:28 -0500 Subject: [language] Another influential book Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Another highly influential book in the forefront. It is a part of a series. ------------------------------Series Foreword------------------ Like bioinformatics, the field of machine learning is interdisciplinary. The goal of building computer systems that can adapt to environments and learn from experience has attracted researchers from from many fields, including computer science, engineering, mathematics, physics, neuroscience, and cognitive science. Out of this research has come a variety of techniques that have the potential to transform many scientific and industrial fields. Several research communities have converged on a common set of issues surrounding supervised, unsupervised, and reinforcement learning problems... Thomas Dietterich ------------------------------Preface ------------------------------------------ In all areas of biological and medical research, the role of the computer has been dramatically enhanced in the last five to ten year period.... The main driving forcee behind the changes has been the advent of new, efficient experimental techniques, primarily DNA sequencing, that have led to an exponential growth of linear descriptions of protein, DNA and RNA molecules. .....As a result, computational support in experimental design, processing of results and interpretation of results has become essential.... The large amounts of data create a critical need for theoretical, algorithmic, and software advances in storing, retrieving, networking, processing, analyzing, navigating, and visualizing biological information. In turn, biological systems have inspired computer science advances with new concepts, including genetic algorithms, artificial neural networks, computer viruses, and synthetic immune systems, DNA computing, artificial life, and hybrid VLSI-DNA gene chips. This cross-fertilization has enriched both fields and will continue to do so in the coming decades. In fact, all the boundaries between carbon-based and silicon-based information processing systems, whether conceptual or material have begun to shrink. ... Bioinformatics has emerged as a strategic discipline at the frontier between biology and computer science, impacting medicine, biotechnology, and society in many ways. ... ....conventional computer science algorithms...are increasingly unable to address many of the most interesting sequence analysis problems. This is due to the inherent complexity of biological systems, brought about by evolutionary tinkering, and our lack of comprehensive theory of life's organization at the molecular level. Machine-learning approaches (e.g. neural networks, hidden Markov models, vector support machines, belief networks), on the other hand, are ideally suited for domains characterized by the presence of large amounts of data, "noisy" patterns, and the absence of general theories. The fundamental idea behind these approaches is to learn the theory automatically from the data, through a process of inference, model fitting, or learning from examples. They form a viable complementary approach to conventional methods. The aim of this book is to present a broad overview of bioinformatics from a machine-learning perspective. ... An often-met criticism of machine-learning techniques is that they are "black-box" approaches" one cannot always pin down exactly how a complex neural network, or hidden Markov model, reaches a particular answer. We have tried to address such legitimate concerns both within the general probabilistic framework and from a practical standpoint. It is important to realize, however, that many other techniques in contemporary molecular biology are used on a purely empirical basis. The polymerase chain reaction, for example, for all its usefulness and sensitivity, is still somewhat of a black-box technique. Many of its adjustable parameters are chosen on a trial-and-error basis. .... Ultimately the proof is in the pudding. We have striven to show that machine-learning methods yield good puddings and are being elegant at the same time. ....We have tried to provide a succinct description of the main biological concepts and problems for the readers with a stronger background in mathematics, statistics and computer science. Likewise, the book is tailored to biologists and biochemists who will often know more about the biological problems than the text explains, but need some help to understand the new data-driven algorithms, in the context of biological data. .... The technical prerequisites for the book are basic calculus, algebra, and discrete probabiltiy theory, at the level of an undergraduate course. Pierre Baldi Soren Brunak MIT Press ------------------------------------------------------------------------------------- Anyone with the prerequisites should/could start reading about this exciting field and maybe even apply some of the ideas to linguistics. -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 01:08:01 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 20:08:01 -0500 Subject: [language] Speech Recognition book Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> -------------------------Yet Another Book--------------------- The text concentrates on those basic statistical ideas that have proven so fruitful in speech recognition: hidden Markov models, data clustering, smoothing of probability distributions, the decision tree method of equivalence classification, the use of information measures as goodness criteria, the maximum entropy probability estimation. The aim is clarity, conciseness, and a unified point of view. ... There are a few explicit prerequisites to understanding the book's contents. Some familiarity with probability, college level mathematics, and (mainly) maturity are required. We progress step by step, trying to be self-contained and appealing to common sense. No advanced theorems are invoked, not even the central limit theorem, or the theory of matrices with nonnegative elements. The book tries to introduce methods heuristically, sometimes delaying mathematical proofs, often eschewing them altogether. Frederick Jelinek Statistical Methods for Speech Recognition MIT Press, 1997 -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Thu Jan 2 02:56:05 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 21:56:05 -0500 Subject: [language] S.Giannini - S.Scaglione - Abstract Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://www.humnet.unipi.it/~medtyp/gian.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Thu Jan 2 02:59:01 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 21:59:01 -0500 Subject: [language] http://www.ai.uga.edu/~mc/#HIST Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Computers in historical linguistics (NEW) An algorithm to align words for historical comparison (PDF) (PostScript) This is the first step in a computer implementation of the Comparative Method. A revised version of this paper has been published in Computational Linguistics. (NEW) Aligning multiple languages for historical comparison (PDF) Extension of the above to more than 2 languages at a time. Presented at COLING-ACL '98. (NEW) The number of distinct alignments of two strings (PDF) (PostScript) Unfinished draft of a paper co-authored with E. R. Canfield. ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From hubeyh at mail.montclair.edu Thu Jan 2 03:02:07 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Wed, 1 Jan 2003 22:02:07 -0500 Subject: [language] quant linguistics Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://www.uta.edu/english/tim/courses/4301f98/linglinx/gen.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Thu Jan 2 17:20:50 2003 From: hubeyh at mail.montclair.edu (H.M.Hubey) Date: Thu, 2 Jan 2003 12:20:50 -0500 Subject: [language] The Significance of Word Lists Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This is a multi-part message in MIME format. -------------- next part -------------- http://csli-publications.stanford.edu/site/1575863006.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Wed Jan 8 15:12:53 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Wed, 8 Jan 2003 10:12:53 -0500 Subject: [language] [Fwd: Did Early Humans Mate With The Locals? Human Genome Data Cast Doubt On "Replacement Theory" Of Human Evolution] Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Source: University Of Utah Date: 2002-12-26 Did Early Humans Mate With The Locals? Human Genome Data Cast Doubt On "Replacement Theory" Of Human Evolution A new analysis of human genetic history deals a blow to the theory that early people moved out of Africa and completely replaced local populations elsewhere in the world. The findings suggest there was at least limited interbreeding between our African ancestors and the residents of areas where they settled. "The new data seem to suggest that early human pioneers moving out of Africa starting 80,000 years ago did not completely replace local populations in the rest of the world," says Henry Harpending, a University of Utah anthropology professor and co-author of the new study. "There is instead some sign of interbreeding." If that conclusion is correct, it contradicts the "replacement theory" of human evolution - a theory Harpending has advocated for more than a decade. "Hypotheses are called into question by data every day in science. That's the way it works," he says. The journal Proceedings of the National Academy of Sciences is publishing the new findings in its online edition the week of Dec. 23, 2002. The study's 20 co-authors include three from the University of Utah: Harpending; Alan Rogers, also a professor of anthropology; and Stephen Wooding, a postdoctoral researcher in human genetics. The study was led by anthropologist Stephen Sherry and mathematician Gabor Marth of the National Center for Biotechnology Information at the National Institutes of Health in Bethesda, Md. Sherry is a former student of Harpending's when both were at Pennsylvania State University. Other co-authors of the new study are from the Washington University School of Medicine in St. Louis, The Johns Hopkins University School of Medicine in Baltimore and the University of California, San Francisco. Most anthropologists agree human ancestors first spread out of Africa roughly 1.8 million years ago, establishing new populations in Europe, Asia and elsewhere. The "multiregional theory" holds modern humans evolved from those multiple populations. The competing "replacement theory" says that the local populations, including Europe's Neanderthals, went extinct when they were replaced roughly between 80,000 and 30,000 years ago by another wave of human immigrants from Africa. Scientists can analyze ancient genetic mutations in modern people to learn about how humans evolved and the size of the human population over time. Mutations occur at a relatively steady rate over time. If the human population were large at a specific point in prehistoric time, more mutations would occur, resulting in greater diversity in genetic mutations found in modern people. A small population of human ancestors would result in fewer mutations, so modern humans would display less genetic diversity. So a person's genetic material "contains the whole history of the population from which you descended," Harpending says. Earlier studies of genetic material known as mitochondrial DNA and microsatellites supported the notion that a small group of perhaps 5,000 to 20,000 primitive humans migrated from East Africa, spread around the world, a rapidly expanded in population as they replaced other human populations elsewhere in Africa 80,000 years ago, and in Asia 50,000 years ago and Europe about 35,000 years ago. The new study, however, analyzed mutations called SNPs (single nucleotide polymorphisms) in DNA from the nucleus of human cells studied for the Human Genome Project, the effort to map the entire human genetic blueprint. The analysis indicates there was a bottleneck in the human population - what looks like a sharp reduction in the number of people - when ancestors of modern humans colonized Europe roughly 40,000 years ago. Researchers are not sure what this means because it conflicts with studies of other kinds of human genetic information, which support the idea that a rapidly expanding African population spread globally and replaced local populations elsewhere. "If Africans moved out of Africa and then populated the whole world, we would see that in the genetic evidence as an expansion in population size," yet the new study indicated the population shrank instead, Rogers says. The evidence five years ago indicated migrating Africans did not interbreed with local populations, while the new study indicates they did, Rogers notes, adding that the conflicting genetic data mean "the question is still open." Harpending says one possible explanation for the new data is that there was a large population of humans who migrated from Africa, yet they kept largely to themselves and mated only to a limited extent with local populations in Europe and elsewhere. Because interbreeding still was uncommon, only a few of the prehistoric European genes were incorporated into the modern human genetic blueprint, giving a false impression that the prehistoric human population collapsed or shrank in size, Harpending says. Another possibility is that the prehistoric African population was large 100,000 years ago, but only a very small number - perhaps a few dozen - of those Africans migrated to other areas some 80,000 years ago, ultimately replacing local populations. That would explain why the human genetic blueprint could give a false impression that the human population collapsed in size even if it did not. But Harpending believes it is unlikely that such a small number of migrants from Africa could spread globally and ultimately replace other populations. Editor's Note: The original news release can be found here. http://www.utah.edu/unews/releases/02/dec/genome.html ------------------------------------------------------------------------------- Note: This story has been adapted from a news release issued for journalists and other members of the public. If you wish to quote any part of this story, please credit University Of Utah as the original source. You may also wish to include the following link in any citation: http://www.sciencedaily.com/releases/2002/12/021226071610.htm News in Brain and Behavioural Sciences - Issue 81 - 14th December, 2002 http://human-nature.com/nibbs/issue81.html M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Sun Jan 19 05:27:02 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 00:27:02 -0500 Subject: [language] Chuvash initial-l, and Common Turkic initial-t Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> I have a set of cognates from the two Turkic families. I got the Chuvash words from Paasonen, but I could only get the Turkish translation, so I had to translate them back into English. I am sure they are cognates. But the only time I have seen t=l is in Hittite tabarna=labarna. More strange parallels follow. I have looked at Watkins and Buck, and Bomhard's Nostratic books. I am still not sure about certain things. Here they are: 1.) Chuvash lapatka bileyi tahtas?; kenarlar?na katran s?r?lm?s,, sapl? bir tahtad?r ki ?zerindeki t?rpanlar bilenir; ?amas,?r tokmag(? [rus?as?:???????] [Paasonen08:82]; This is a board for sharpening scythes. It is also used for beating (washing) clothes. Turkic tapla (to sharpen); found at least in Karachay-Balkar 2) Chuvash lapa(rdat suda ?alkalanmak[Paasonen08:82]; To be rinsed in water (that is what is says) Turkic shIpIrdat to make gurgling noises 3.a) Chuvash lapa(rkka ?amur (yollarda); kir, pislik, deg(ersiz, is,e yaramaz[Paasonen08:82]; (mud, dirt, worthless) Turkic toprak dirt, earthen The -ge suffix is an ancient Turkic suffix (Clauson) . 3.b) Chuvash lapra kir, ?amur[Paasonen08:83]; (dirt, mud) It looks like Chuvash retains both versions. It looks like the word for "earth" in Turkic should be more like *torpang so that the original root *tor becomes cognate with Latin terra, Sumerian tir, Turkic toz (dust), Turkic tuz (salt), Turkic turi (sour), etc. Common Turkic for "land, earth" is cer/yet, but the initial *c/y are reconstructed as deriving from *d (Doerfer) thus der also fits into the same scheme. This word has been found in runic inscriptions in the North Caucasus (Tavkul...) 4) Chuvash lapa(stat g?r?lt? ile vurmak, kakmak[Paasonen08:82]; (to hit, push, shove noisily) Turkic tep to kick, hit, move; It shows up in various words tekme (<*tepme) a kick, tebre (tepre) to move, to kick, etc 5) Chuvash lac(ag^a ?amur[Paasonen08:82]; Shows up in Turkish as lachka e.g a borrowing evidently 6) Chuvash la?an ?amas,?r teknesi, leg(en [Kaz. la?an] [Paasonen08:82]; tok, tegen, ogen (a trough for washing clothes) It shows up in Turkic as legen, another borrowing. It shows up also in words like tegene (a tub, basin) tekne (trough). It seems to be related to to"k (to pour). 7) Chuvash laja at [Kaz. alas,a] [Paasonen08:82]; (horse. Shows up in Kruger's book as loshad) Turkic languages (along with some Uralic languages have it as alasha. The word seems to derive from en/in (to go down), which also shows up in Turkish as o"n (front), e.g. o"nde (in front of ). The words al/il show up as "front" (alda, ileri, etc) but as "low" as in alasha, and Turkish alchak. I derived these and related them to "donkey", ass, anshe, etc. once on sci.lang. e.g. alasha, alashang, onos, onager, donkey > doneg. http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3DBF1AB4.7070205%40mail.montclair.edu&prev=/groups%3Fq%3Dalashang%2Bgroup:sci.lang.*%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D3DBF1AB4.7070205%2540mail.montclair.edu%26rnum%3D1 8a) Chuvash lap ova, d?z yer [Bas,k. K. lapak al?ak] [Paasonen08:83]; (plain, flat ground) But in K Bashk apparently lapak means "low". But in Turkic the t-words are tobbe (top) tepe (hill) tu"b (bottom) to"ben (lower) 8b) Chuvash le(pke bas,?n tepesi [Paasonen08:85]; (the top of the head) see others above 9) Chuvash lapka sulu kar[Paasonen08:83]; (wet snow) 10) Chuvash lar oturmak [Soy. Uyg. R. olur, Yak. olor, Kom. Krm. Kumd. Uyg. R. oltur, Kaz ut?r] [Paasonen08:83]; (to sit, to dwell?) Turkic tur to stay, to remain, to stand, to dwell Turkic tu"sh to fall down, go down, sit 11) Chuvash lavkka d?kan[Paasonen08:83]; (store) This means that Turkic dukkan, KBal tuken are directly from protoTurkic and lavka is not Russian. 12) Chuvash lek as?l? durmak, bulunmak; ?atmak, isabet etmek [Kaz. lek, ?lek] [Paasonen08:84]; (to remain hanging, to be found, to hit) Turkic as to hang e.g. from a wall or something Turkic il to hang onto, to connect to something e.g. ilik button-hole, ilin to latch onto something Turkic tak to hang (transitive) This last one is constructed like yak (to burn, trans); vs yan (to burn, intrans) from *ya Other verbs formed similarly: yIk (to knock down, pull down), bak (to look), to k (to pour, see above) Strangely enough, Hittite lak (to burn), and Hittite luk (to knock down) (Are the last two Hittite words from IE roots?) 13) Chuvash lere orada[Paasonen08:84]; (there, at there) Turkic deri/teri "up to [there]" KBal alayda "there" It is very strange because I constructed protoTurkic *th so that *th>l and *th>t to explain these but in addition to Hittite l-words now we get English *th words, e.g. lere=there 14) Chuvash les' iletmek, g?t?rmek, tas,?mak [Kaz. ilt, it; Kaz. Tob. Ba(r. R. ilt; Uyg. elt; Yak ilt] [Paasonen08:84]; (to forward, to carry) Turkic tashI to carry Hittite tarna to carry! Another strange parallel. Turkish tash (to overflow); Turkic tIsh (outside) e.g. overflow=go outside the bucket Connected with the scapegoat from Hittite. 15) Chuvash la(g^a kakmak, tak?rt? etmek[Paasonen08:84]; (to hit, shove, make noises e.g. takIrtI) Turkic toku to beat up Turkic tu"y, do"v, tu"g to beat up Turkic tayak stick, 16) Chuvash la(plan dinmek (r?zgar hakk?nda); yavas,lamak[Paasonen08:84]; KBal tIn to calm down, to be quiet, calm 17) Chuvash la(rg^a m?r?ldan, m?zm?z[Paasonen08:85]; (to complain, ) KBal tarIghIrgha < * targa? to complain (Irgha is the infinitive case) 18) Chuvash la(sta(rdat titremek, sallanmak[Paasonen08:85]; (to shake, to swing) Turkish titre to shake, to tremble KBal tentire to tremble 19) Chuvash la(ymag^a salya [Paasonen08:85]; (spittle, saliva, sputum) 20) Chuvash le(g^e( kepek, tortu[Paasonen08:85]; 21) Chuvash le(p ?l?k[Paasonen08:85]; (warm, tepid) Turkic tamId to blaze up and many other such words about heat, warmth, iron-working, etc. including words with the shift t>k e.g. kabIn (to catch fire) from apparently something like *tab. It looks like *th>l and *th > t. Perhaps those with knowledge of the present state of *PIE and *PAA can give me hints as to the most likely phonetic realization of the *th. -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 00:23:15 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 19:23:15 -0500 Subject: [language] Datamining, Statistics and Linguistics Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> This excerpt is from a brand new and very influential book. The Preface explains why this book was necessary. -------------------begin here------------------------ The field of statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded in both size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of "datamining"; statistical and computational problems in biology and medicine have created "bioinformatics". Vast amounts data are being generated in many fields, and the statistician's job is to make sense of it all: extract important patterns and trends, and understand "what the data says." We call this learning from data. The challenges from data have led to a revolution in the statistical sciences. Since computation plays such a key role, it is not surprising that much of this new development has been done by researchers in other fields such as computer science and engineering. The learning problems that we consider can be roughly categorized as either supervised or unsupervised. In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures. This book is our attempt to bring together many of the new ideas in learning, and explain them in a statistical framework. While some mathematical details are needed, we emphasize the methods and their conceptual underpinnings rather than their theoretical properties. As a result, we hope that this book will appeal not just to statisticians but also to researchers and practitioners in a wide variety of fields. Just as we have learned a great deal from researchers outside the field of statistics, our statistical viewpoint may help others to better understand different aspects of learning: There is no true interpretation of anything; interpretation is a vehicle in the service of human comprehension. The value of interpretation is in enabling others to fruitfully think about an idea. Andreas Buja We would like to acknowledge....... Trevor Hastie Robert Tibshirani Jerome Friedman May 2001 -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 00:59:28 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 19:59:28 -0500 Subject: [language] Another influential book Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> Another highly influential book in the forefront. It is a part of a series. ------------------------------Series Foreword------------------ Like bioinformatics, the field of machine learning is interdisciplinary. The goal of building computer systems that can adapt to environments and learn from experience has attracted researchers from from many fields, including computer science, engineering, mathematics, physics, neuroscience, and cognitive science. Out of this research has come a variety of techniques that have the potential to transform many scientific and industrial fields. Several research communities have converged on a common set of issues surrounding supervised, unsupervised, and reinforcement learning problems... Thomas Dietterich ------------------------------Preface ------------------------------------------ In all areas of biological and medical research, the role of the computer has been dramatically enhanced in the last five to ten year period.... The main driving forcee behind the changes has been the advent of new, efficient experimental techniques, primarily DNA sequencing, that have led to an exponential growth of linear descriptions of protein, DNA and RNA molecules. .....As a result, computational support in experimental design, processing of results and interpretation of results has become essential.... The large amounts of data create a critical need for theoretical, algorithmic, and software advances in storing, retrieving, networking, processing, analyzing, navigating, and visualizing biological information. In turn, biological systems have inspired computer science advances with new concepts, including genetic algorithms, artificial neural networks, computer viruses, and synthetic immune systems, DNA computing, artificial life, and hybrid VLSI-DNA gene chips. This cross-fertilization has enriched both fields and will continue to do so in the coming decades. In fact, all the boundaries between carbon-based and silicon-based information processing systems, whether conceptual or material have begun to shrink. ... Bioinformatics has emerged as a strategic discipline at the frontier between biology and computer science, impacting medicine, biotechnology, and society in many ways. ... ....conventional computer science algorithms...are increasingly unable to address many of the most interesting sequence analysis problems. This is due to the inherent complexity of biological systems, brought about by evolutionary tinkering, and our lack of comprehensive theory of life's organization at the molecular level. Machine-learning approaches (e.g. neural networks, hidden Markov models, vector support machines, belief networks), on the other hand, are ideally suited for domains characterized by the presence of large amounts of data, "noisy" patterns, and the absence of general theories. The fundamental idea behind these approaches is to learn the theory automatically from the data, through a process of inference, model fitting, or learning from examples. They form a viable complementary approach to conventional methods. The aim of this book is to present a broad overview of bioinformatics from a machine-learning perspective. ... An often-met criticism of machine-learning techniques is that they are "black-box" approaches" one cannot always pin down exactly how a complex neural network, or hidden Markov model, reaches a particular answer. We have tried to address such legitimate concerns both within the general probabilistic framework and from a practical standpoint. It is important to realize, however, that many other techniques in contemporary molecular biology are used on a purely empirical basis. The polymerase chain reaction, for example, for all its usefulness and sensitivity, is still somewhat of a black-box technique. Many of its adjustable parameters are chosen on a trial-and-error basis. .... Ultimately the proof is in the pudding. We have striven to show that machine-learning methods yield good puddings and are being elegant at the same time. ....We have tried to provide a succinct description of the main biological concepts and problems for the readers with a stronger background in mathematics, statistics and computer science. Likewise, the book is tailored to biologists and biochemists who will often know more about the biological problems than the text explains, but need some help to understand the new data-driven algorithms, in the context of biological data. .... The technical prerequisites for the book are basic calculus, algebra, and discrete probabiltiy theory, at the level of an undergraduate course. Pierre Baldi Soren Brunak MIT Press ------------------------------------------------------------------------------------- Anyone with the prerequisites should/could start reading about this exciting field and maybe even apply some of the ideas to linguistics. -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From hubeyh at mail.montclair.edu Mon Jan 20 01:08:01 2003 From: hubeyh at mail.montclair.edu (H.M. Hubey) Date: Sun, 19 Jan 2003 20:08:01 -0500 Subject: [language] Speech Recognition book Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> -------------------------Yet Another Book--------------------- The text concentrates on those basic statistical ideas that have proven so fruitful in speech recognition: hidden Markov models, data clustering, smoothing of probability distributions, the decision tree method of equivalence classification, the use of information measures as goodness criteria, the maximum entropy probability estimation. The aim is clarity, conciseness, and a unified point of view. ... There are a few explicit prerequisites to understanding the book's contents. Some familiarity with probability, college level mathematics, and (mainly) maturity are required. We progress step by step, trying to be self-contained and appealing to common sense. No advanced theorems are invoked, not even the central limit theorem, or the theory of matrices with nonnegative elements. The book tries to introduce methods heuristically, sometimes delaying mathematical proofs, often eschewing them altogether. Frederick Jelinek Statistical Methods for Speech Recognition MIT Press, 1997 -- M. Hubey -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o The only difference between humans and machines is that humans can be created by unskilled labor. Arthur C. Clarke /\/\/\/\//\/\/\/\/\/\/ http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><>----Language----<><><><><><><><><><><><><> Copyrights/"Fair Use": http://www.templetons.com/brad/copymyths.html The "fair use" exemption to copyright law was created to allow things such as commentary, parody, news reporting, research and education about copyrighted works without the permission of the author. That's important so that copyright law doesn't block your freedom to express your own works -- only the ability to express other people's. Intent, and damage to the commercial value of the work are important considerations. You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu