From hdls at unm.edu Mon Aug 2 18:54:32 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Mon, 2 Aug 2004 12:54:32 -0600 Subject: Second call for the HDLS-6 Linguistics Conference (Nov. 4-6, 2004) at the Univ of New Mexico Message-ID: Please note that the conference will take place from November 4th to the 6th and not the 3rd - 5th as indicated in the first posting. -------------------------------------------------- The Sixth High Desert International Linguistics Conference will be held at the University of New Mexico, Albuquerque, NM, November 4 -6, 2004. We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From spike at darkwing.uoregon.edu Wed Aug 4 20:52:00 2004 From: spike at darkwing.uoregon.edu (Spike Gildea) Date: Wed, 4 Aug 2004 13:52:00 -0700 Subject: Call for Papers Message-ID: 79th Annual Meeting of the Linguistic Society of America 6-9 January 2005 San Francisco, CA Contact: MReynolds at lsadc.org Further information: http://www.lsadc.org Abstract Deadline: 1 SEPTEMBER 2004 Meeting Description The 79th Annual Meeting of the Linguistic Society of America will be held at the Hyatt Regency San Francisco, 6-9 January 2005. The American Dialect Society, the American Name Society, the North American Association for the History of the Language Sciences, the Society for Pidgin and Creole Linguistics, and the Society for theStudy of the Indigenous Languages of the Americas will meet concurrently with the LSA. On the program will be plenary presentations by Penny Eckert, Victor Golla, Peter Ladefoged, and George Lakoff. The Presidential Address will be given by Joan Bybee. Call for Papers All members of the LSA are invited to submit abstracts for 15-minute, 30-minute, and poster presentations. Membership is a requirement for submitting and presenting; dues for 2004 ($80 US regular, $40 US student; $100 non-US regular, $60 non-US student) may accompany submissions. Submittal forms and the guidelines and specifications for abstracts may be found in the June LSA Bulletin or at http://www.lsadc.org. The guidelines for abstract preparation must be rigorously adhered to for the abstracts to be considered by the Program Committee. The deadline for receipt of all abstracts is 1 September 2004, 5:00 PM EDT. Submissions should be addressed to: LSA Secretariat, 1325 18th Street, NW, Suite 211, Washington, DC 20036-6501. Members are advised that post office delivery, including express mail and priority mail, is erratic. Abstracts received after the deadline will not be considered and will be returned to the authors. Strict enforcement of this deadline is necessary. From francisco.ruiz at dfm.unirioja.es Fri Aug 6 00:02:22 2004 From: francisco.ruiz at dfm.unirioja.es (Francisco Ruiz de Mendoza) Date: Fri, 6 Aug 2004 01:02:22 +0100 Subject: ARCL-3 Call for papers Message-ID: The Annual Review of Cognitive Linguistics (published under the auspices of the Spanish Cognitive Linguistics Association) aims to establish itself as an international forum for the publication of high-quality original research on all areas of linguistic enquiry from a cognitive perspective. Fruitful debate is encouraged with neighboring academic disciplines as well as with other approaches to language study, particularly functionally-oriented ones. Submissions for ARCL-3 (2005) should be received before November 30, 2004. For submission guidelines, visit: http://www.unirioja.es/dptos/dfm/ARCLGuidelines.pdf For general information, go to: http://www.benjamins.com/cgi-bin/t_seriesview.cgi?series=ARCL Francisco J. RUIZ DE MENDOZA- Editor Universidad de La Rioja Departamento de Filolog�as Modernas Edificio de Filolog�a c/San Jos� de Calasanz s/n Campus Universitario 26004, Logro�o, La Rioja, Spain Tel.: 34 (941) 299433 / (941) 299430 FAX.: 34 (941) 299419 e-mail: fran cisco.ruiz at dfm.unirioja.es From Salinas17 at aol.com Fri Aug 20 13:44:49 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 09:44:49 EDT Subject: On the Relativity Front Message-ID: COGNITION: Life Without Numbers in the Amazon (p. 1093) --------------------------------------------------------------------------- Constance Holden In an article published online this week by Science (www.sciencemag.org/cgi/content/abstract/1094492), a psycholinguist demonstrates that among members of a tiny tribe in the Amazon jungle that has no words for numbers beyond two, the ability to conceptualize numbers is no better than it is among pigeons, chimps, or human infants. Full story at http://www.sciencemag.org/cgi/content/full/305/5687/1093a?etoc ------------------------------------ But of course this not eliminate the possibility that the number two is one of those prelinguistic schema. Regards, Steve Long From Salinas17 at aol.com Fri Aug 20 15:55:37 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 11:55:37 EDT Subject: Fwd: [FUNKNET] On the Relativity Front Message-ID: From Salinas17 at aol.com Fri Aug 20 19:19:13 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 15:19:13 EDT Subject: FWD: On the Relativity Front (2) Message-ID: Forwarded Message: Subj: Re: [FUNKNET] On the Relativity Front Date: Friday, August 20, 2004 9:56:04 AM From: daniel.everett at uol.com.br To: Salinas17 at aol.com cc: dan.everett at man.ac.uk, funknet at rice.edu From: daniel.everett at uol.com.br (Daniel Everett) To: Salinas17 at aol.com CC: dan.everett at man.ac.uk (Daniel L.Everett), funknet at rice.edu If anyone wants to discuss this case, about Piraha, further. I would be happy to do so. Peter Gordon did this work with me, originally going to run experiments to check out and refine my own anecdotes about lack of counting and numbers in Piraha. Gordon's conclusion is Whorfian. However, I disagree with this. In a larger article on my website - 'Cultural Constraints on Grammar and Cognition in Piraha', I discuss the number findings in a larger context, arguing that there are several ways in which Piraha culture constrains grammar (Piraha is also, I claim, the only language without embedding, for example, it lacks color words, it has the simplest kinship system known, the smallest phonemic inventory, etc.). I offer a single account for all of this based on the cultural constraint against talking about things outside the immediate experience of members of the community. Dan On 20 Aug 2004, at 10:44, Salinas17 at aol.com wrote: > COGNITION: Life Without Numbers in the Amazon (p. 1093) > ----------------------------------------------------------------------- > ---- > Constance Holden > > In an article published online this week by Science > (www.sciencemag.org/cgi/content/abstract/1094492), a psycholinguist > demonstrates that among members of a tiny tribe in the Amazon jungle > that > has no words for numbers beyond two, the ability to conceptualize > numbers > is no better than it is among pigeons, chimps, or human infants. > Full story at > http://www.sciencemag.org/cgi/content/full/305/5687/1093a?etoc > ------------------------------------ > But of course this not eliminate the possibility that the number two > is one > of those prelinguistic schema. > > Regards, > Steve Long > > ------------------------------- Daniel L. Everett Professor of Phonetics and Phonology Postgraduate Programme Director Department of Linguistics and English Language University of Manchester Manchester M13 9PL UK Fax: 44 161 275 3187 Phone: 44 161 275 3158 http://ling.man.ac.uk/info/staff/DE/DEHome.html From sathomps at linguistics.ucsb.edu Sat Aug 21 21:03:13 2004 From: sathomps at linguistics.ucsb.edu (Sandra Thompson) Date: Sat, 21 Aug 2004 14:03:13 -0700 Subject: UCSB job: computational/corpus linguistics Message-ID: The Linguistics Department of the University of California, Santa Barbara seeks to hire a specialist in computational and/or corpus linguistic approaches to language. The appointment will be tenure-track at the Assistant Professor level, effective July 1, 2005. We are especially interested in candidates whose research shows theoretical implications bridging computational and/or corpus linguistics and general linguistics, and who can interact with colleagues and students across disciplinary boundaries at UCSB. Candidates will be preferred whose research engages with the departmental focus on functional and usage-based approaches to explaining language. Research experience with corpora of naturally occurring language use is required. Candidates must have demonstrated excellence in teaching, and will be expected to teach a range of graduate and undergraduate courses in both computational/corpus linguistics and general linguistics. Ph.D. in linguistics or a related field such as cognitive science or computer science is required. Ph.D. normally required by the time of appointment. Applicants should submit hard copy of curriculum vitae, statement of research interests, 1-2 writing samples, and full contact information for three academic references to the Search Committee, Linguistics Department, UCSB, Santa Barbara, CA 93106-3100. Fax and email applications not accepted. Inquiries may be addressed to the above address or via email to lingsearch at linguistics.ucsb.edu. Tentative deadline is November 12, 2004. However, the position will remain open until filled. Preliminary interviews will be conducted at the Linguistic Society of America, although attendance is not required for consideration. The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching and service. UCSB is an Equal Opportunity/Affirmative Action employer. From hdls at unm.edu Wed Aug 25 00:07:33 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Tue, 24 Aug 2004 18:07:33 -0600 Subject: HDLS-6 Linguistics Conference (Nov. 4-6, 2004) Keynote speakers Message-ID: We are very pleased to announce that Joan Bybee (University of New Mexico), David McNeil (University of Chicago), and Suzanne Kemmer (Rice University) have accepted our invitations to deliver keynote addresses at The Sixth High Desert Linguistics Conference November 4 -6, 2004 at the University of New Mexico, Albuquerque, NM . We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From Julia.Ulrich at degruyter.com Wed Aug 25 09:05:05 2004 From: Julia.Ulrich at degruyter.com (Julia Ulrich) Date: Wed, 25 Aug 2004 11:05:05 +0200 Subject: Inaugural Issue of Intercultural Pragmatics is now available Message-ID: Mouton de Gruyter is proud to announce the publication of INTERCULTURAL PRAGMATICS Edited by Istvan Kecskes The articles of the first issue are available as free downloads at http://www.degruyter.de/journals/intcultpragm/intcultpragm1_1.html ISTVAN KECSKES Editorial: Lexical merging, conceptual blending, and cultural crossing JACOB L. MEY Between culture and pragmatics: Scylla and Charybdis? The precarious condition of intercultural pragmatics JACQUES MOESCHLER Intercultural pragmatics: a cognitive approach BERT PEETERS ''Thou shalt not be a tall poppy'': Describing an Australian communicative (and behavioral) norm Interview JÓZSEF ANDOR The master and his performance: An interview with Noam Chomsky Forum RACHEL GIORA On the Graded Salience Hypothesis GABRIELE KASPER Speech acts in (inter)action: Repeated questions JAN NUYTS The cognitive-pragmatic approach. Contributors to this issue For more information, please visit http://www.degruyter.de/rs/384_7078_ENU_h.htm To order please contact SFG Servicecenter-Fachverlage Postfach 4343 72774 Reutlingen, Germany Fax: +49 (0)7071 - 93 53 - 33 E-mail: deGruyter at s-f-g.com For USA, Canada and Mexico: Walter de Gruyter, Inc. 200 Saw Mill River Road Hawthorne, NY 10532, USA Fax: +1 (914) 747-1326 E-mail: cs at degruyterny.com Please visit our website for other publications by Mouton de Gruyter: http://www.mouton-publishers.com __________________________________________________________________________________________________________________________ Diese E-Mail und ihre Dateianhaenge ist fuer den angegeben Empfaenger und/oder die Empfaengergruppe bestimmt. Wenn Sie diese E-Mail versehentlich trotzdem erhalten haben, setzen Sie sich bitte mit dem Absender oder Ihrem Systembetreuer in Verbindung. Diese Fusszeile bestaetigt ausserdem, dass die E-Mail auf zum Pruefzeitpunkt bekannte Viren ueberprueft wurde. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender or the system manager. This footnote also confirms that this email message has been swept for the presence of computer viruses. From geoffnathan at wayne.edu Wed Aug 25 13:26:04 2004 From: geoffnathan at wayne.edu (Geoff Nathan) Date: Wed, 25 Aug 2004 09:26:04 -0400 Subject: Phonology Theme Session--Reminder Message-ID: Just a reminder that the deadline for abstract submission is rapidly approaching... Phonology in the Cognitive Grammar Worldview a theme session within the Ninth International Cognitive Linguistics Conference Yonsei University in Seoul, Korea. 17-22nd of July 2005 We are seeking abstracts for papers exploring how fundamental principles of Cognitive Grammar (prototype theory, experiential grounding--‘embodiment’, principles of categorization, including the concept of the ‘basic level,’ and usage-based theories) can elucidate the organization of phonology in Language (either spoken or signed). We invite abstracts from researchers in all areas of cognitive linguistics and related frameworks who are interested in the way the purely form-oriented, physical aspect of language is perceived, categorized, organized and produced. Abstracts should be between 300 and 500 words, and clearly display their relevance to the topic. Abstracts should be submitted electronically (RTF or PDF format) to Geoffrey Nathan (geoffnathan at wayne.edu) and José Antonio Mompeán,(mompean at um.es), and should reach us no later than August 30. Authors will be notified by September 5th whether their abstracts have been selected for the theme session. The theme session proposal will then be submitted to the organizers of the ICLC-9, who will notify us of acceptance or rejection by January 15th. [apologies for cross-posting] Geoffrey S. Nathan Faculty Liaison, Computing and Information Technology, and Associate Professor of English Linguistics Program Phone Numbers Department of English Computing and Information Technology: (313) 577-1259 Wayne State University Linguistics (English): (313) 577-8621 Detroit, MI, 48202 C&IT Fax: (313) 577-1338 From clements at indiana.edu Fri Aug 27 15:47:19 2004 From: clements at indiana.edu (clements) Date: Fri, 27 Aug 2004 10:47:19 -0500 Subject: extension of "the" Message-ID: Dear Funknetters, Does anyone know of any studies on the extension of the use of "the". In her home town (Stafford VA), a student of mine noted that "the" can be used: --With most acronyms I have the AOL. She has the SARS. --With generics I like the coffee/the candy. (to refer to all coffee or candy) --With many proper place names. These tend to be specific references, especially the store names. If my friend told me she was going to "the Pier 1," I would understand that she meant the Pier 1 in Central Park. We are going to the Nashville. I'm in the Target. He bought it at the Pier 1. I have heard it reported with abstract nouns, as in I have the diabetes and a colleague of mine in Fort Wayne IN reported hearing it from his students. Any leads would be most welcome. If there's interest, I'll write up a summary. Clancy Clements From john at research.haifa.ac.il Sat Aug 28 03:56:06 2004 From: john at research.haifa.ac.il (john at research.haifa.ac.il) Date: Sat, 28 Aug 2004 06:56:06 +0300 Subject: extension of "the" Message-ID: Dear Clancy, This is completely off the top of my head, but I noticed that 'the' is used with chain stores in Michigan ('the Kroger's') but not in the northeast (at least 10 years ago). In Philadelphia in the 1980's Blacks but not Whites said 'the AIDS'. John Myhill Quoting clements : > Dear Funknetters, > Does anyone know of any studies on the extension of the use of "the". In > her home town (Stafford VA), a student of mine noted that "the" can be > used: > > --With most acronyms > I have the AOL. > She has the SARS. > > --With generics > I like the coffee/the candy. (to refer to all coffee or candy) > > --With many proper place names. These tend to be specific references, > especially the store names. If my friend told me she was going to "the > Pier 1," I would understand that she meant the Pier 1 in Central Park. > We are going to the Nashville. > I'm in the Target. > He bought it at the Pier 1. > > I have heard it reported with abstract nouns, as in > > I have the diabetes > > and a colleague of mine in Fort Wayne IN reported hearing it from his > students. > > Any leads would be most welcome. If there's interest, I'll write up a > summary. > > Clancy Clements > > > > ------------------------------------------------------ This mail sent through IMP Webmail of Haifa University http://webmail.haifa.ac.il From language at sprynet.com Sun Aug 29 20:24:33 2004 From: language at sprynet.com (Alexander Gross) Date: Sun, 29 Aug 2004 16:24:33 -0400 Subject: extension of "the" Message-ID: > Does anyone know of any studies on the extension of the use of "the". > In her home town (Stafford VA), a student of mine noted that "the" > can be used: I find it fascinating that anyone would assume that "the" might have a "normal" use which could then be subject to extension. And that there would be any studies which could conceivably place its usage within any sort of normative range at all or explore the possible range of extensions. I wonder if this may be just one further offshoot from the illusion shared by many linguists that the guiding principles of language have been discovered, described, and even codified. Or in Steven Pinker's words, linguists have found "the single mental design underlying" all languages and "we all have the same minds." Four years ago I issued a challenge not only to all those on the sci.lang USENET newsgroup concerning this matter, it was in fact a repeat of the very challenge I had also issued a few years earlier to one of the foremost founders of AI, a master mathematician and a name so eminent as to require no further airing here (though the curious may discover it by running a Deja search on the sci.lang archives). Neither this expert nor the linguists on sci.lang were able to come up with a response to this challenge. I am now readdressing it to my colleagues on FUNKNET to discover if they will fare any better with it. The challenge went as follows: --------------------------------------------- Since you (singular and plural) imagine that it will one day be possible to construct an "adequate" machine translation system, here is *your* little assignment. It's easy, it's all in English. I want you to come up with the precise, practical rules by which we decide to put "the" in front of a noun as opposed to when we decide to put "a" or "an" in front of a noun as opposed to when we decide to put absolutely nothing ("zero-grade article") in front of a noun. Also: precisely when do we have a choice between two possible methods? Further requirement: the rule or rules you come up with have to work for ALL instances of putting articles in front of nouns. The rules should be so fool-proof and logically transparent that we can even make an expert system paradigm out of them, so that anyone who needed to know which rule to apply could simply consult the expert system and find the right answer. You'll need something like this for that "adequate" MT system--it will be crucial to spell out these rules for English, especially since some fairly different ones apply to almost any foreign language you can name. And even languages without articles as such, like Russian and Chinese, have a few quirks in this regard, to say nothing of the problems of translating all these languages into and out of English. Today's most advanced MT systems get all this wrong as often as right. But there's another and even better reason for coming up with a solution. I've tried this task more than once, so it's more than an idle riddle. I was first asked to come up with a solution by a Chinese senior revisor & computer linguist friend at the UN translation department who himself had trouble deciding which article to use. I was eager to solve it for him, and I was almost certain I could come up with the solution quite easily. I was also interested because some of my students in a translator-training course I was then teaching also asked me for the same solution. They really needed the answer, because they continually made mistakes with articles both in their writing and speech, which made it sound as though all they could manage was "broken English." And this is what many people think when foreigners get their articles wrong, either in speech or in translations. But these were perfectly literate & intelligent people--they just couldn't figure out the rules for English articles. The point here is not merely to come up with the usual explanation for this problem (which amounts to little more than saying "when something is definite, it takes the definite article, when something is indefinite, it...). The point IS to come up with a clear set of rules that can help foreigners to learn English. And beyond that can incidentally also serve as the basis for an "adequate" MT program. Perhaps you also will make the mistake of supposing--as I did--that this is a trivial problem. Believe me--it isn't. I had no trouble coming up with the first two or three rules, but there were still many inexplicable instances, where I had to say lamely to my students "Learn the Language." I ended up weaseling out by telling both my students and my friend at the UN to read the NY Times & other sources & try to figure out for themselves why "a" or "the" or neither one is used. As Martin Kay has pointed out, you can throw all the computing power in the world at MT and still come up empty. At what point does a trivial problem become an intractable one? -------------------------------- Let me reiterate that while this may look like a simple problem, it isn't. Using an If, Then, Else logical framework, I tried to build something like an expert system that could represent its terms but couldn't truly get beyond the first few rules. The permissible range for using our articles varies not merely between British and American English but within our own US variety according to differences of region, class, education, national origin, and age. It may even vary between members of the same family and over time within the usage of a single individual. And we're talking just about English here--imagine the complexities that arise when other languages are brought in. And since this is true for such an extremely small subset of structural linguistic problems in a single language, how much more true must it be for the august, all-embracing, universalist theory advanced by MIT linguists? To say nothing of all its cognitive this and that spinoffs? A French friend tells me the manual for French-English conversion of articles looks like a small law book, which even then is sure to have exceptions and omissions. If after decades of detailed rule-seeking and measurements and busy work on the "syntactic structures" of minute language byways our current school of linguists can't solve this problem, then what can they solve? very best to all! alex ----- Original Message ----- From: "clements" To: Cc: Sent: Friday, August 27, 2004 11:47 AM Subject: [FUNKNET] extension of "the" > Dear Funknetters, > Does anyone know of any studies on the extension of the use of "the". In > her home town (Stafford VA), a student of mine noted that "the" can be > used: > > --With most acronyms > I have the AOL. > She has the SARS. > > --With generics > I like the coffee/the candy. (to refer to all coffee or candy) > > --With many proper place names. These tend to be specific references, > especially the store names. If my friend told me she was going to "the > Pier 1," I would understand that she meant the Pier 1 in Central Park. > We are going to the Nashville. > I'm in the Target. > He bought it at the Pier 1. > > I have heard it reported with abstract nouns, as in > > I have the diabetes > > and a colleague of mine in Fort Wayne IN reported hearing it from his > students. > > Any leads would be most welcome. If there's interest, I'll write up a > summary. > > Clancy Clements > > > > From jrubba at calpoly.edu Sun Aug 29 20:50:17 2004 From: jrubba at calpoly.edu (Johanna Rubba) Date: Sun, 29 Aug 2004 13:50:17 -0700 Subject: "the" Message-ID: Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. I imagine it comes from shortening "the 405 freeway". The areas that do not use the article leave out the word "freeway" and just say "take 405 south ... " Maybe someone else has attested facts on this variation. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Johanna Rubba Associate Professor, Linguistics English Department, California Polytechnic State University One Grand Avenue • San Luis Obispo, CA 93407 Tel. (805)-756-2184 • Fax: (805)-756-6374 • Dept. Phone. 756-2596 • E-mail: jrubba at calpoly.edu • Home page: http://www.cla.calpoly.edu/~jrubba ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From adamzero at uchicago.edu Sun Aug 29 20:55:52 2004 From: adamzero at uchicago.edu (adam e leeds) Date: Sun, 29 Aug 2004 15:55:52 -0500 Subject: Underpinnings of functional linguistics? Message-ID: Greetings all, I have a request to make of you, but to preface, a short introductory statement seems to be in order. I'm an undergraduate the University of Chicago, soon to graduate and hopefully soon to enter a graduate program in anthropological linguistics. My interests include, painting with the broad brush, indexicality/deixis, reference maintenance, the dynamics of face-to-face interaction, information structure, reported discourse and cognitive development, anti-realist holist conherentist contextualist theories of mind sign and world, and epistemological issues in the social sciences. My question is a basic one: Can any of you recommend for me good introductory and in depth functional treatments of linguistics (articles and book-length), touching on any or all of: the main tenets, assumptions, arguments for, and structures of, methodological issues, etc. There is a bewildering array of capital-lettered Functional Syntaxes out there, but I don't really know that they are the place to start. Thanks many times over in advance for your responses (which you might want to direct toward me, personally, rather than toward the list). Regards, Adam E. Leeds From rmalouf at mail.sdsu.edu Sun Aug 29 23:13:21 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Sun, 29 Aug 2004 16:13:21 -0700 Subject: extension of "the" In-Reply-To: <003101c48e06$33135730$79999c04@user1sznx2zyoc> Message-ID: On Aug 29, 2004, at 1:24 PM, Alexander Gross wrote: > Since you (singular and plural) imagine that it will one day be > possible to construct an "adequate" machine translation system, > here is *your* little assignment. It's easy, it's all in English. I > want you to come up with the precise, practical rules by which > we decide to put "the" in front of a noun as opposed to when > we decide to put "a" or "an" in front of a noun as opposed to > when we decide to put absolutely nothing ("zero-grade article") > in front of a noun. Also: precisely when do we have a choice > between two possible methods? While no one (that I know of) has written such rules, there's been considerable work on addressing this problem using machine learning and statistical models. For example, this paper reports some early experiments: http://citeseer.ist.psu.edu/minnen00memorybased.html I know they've improved on these results since then, but I can't find the reference off hand. At any rate, the performance of the best models is getting close to that of humans at guessing which article will be used in a given context. --- Rob Malouf rmalouf at mail.sdsu.edu Department of Linguistics and Oriental Languages San Diego State University From Salinas17 at aol.com Mon Aug 30 14:34:05 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 10:34:05 EDT Subject: The Chinese Diplomat's "the" Message-ID: In a message dated 8/29/04 4:25:21 PM, language at sprynet.com writes: << I was first asked to come up with a solution by a Chinese senior revisor & computer linguist friend at the UN translation department who himself had trouble deciding which article to use. >> In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: << At any rate, the performance of the best models is getting close to that of humans at guessing which article will be used in a given context. >> There's an irony to why one sees such adherence to structuralist criteria on the "functional" linguistics list. In most situations, of course, a computer model cannot possibly predict the use of "the" versus "a" unless it also reads minds. If Alex's Chinese diplomats are merely trying to avoid "Broken English", then should we assume that their English otherwise is 100% comprehensible? In other words, is it that they are never misunderstood, but are merely using an inappropriate "ungrammatical" English? And why would that trouble them? What is the consequence of a foreign diplomat speaking understandable but stylistically non-conforming English? Microsoft Word does a pretty good job of correcting inappropriate omission of an article before a singular noun. When I type in, "Will you please get car?", it tells me an article is missing and prompts me to choose between "a car" and "the car". It even tells me that one is definite and one is indefinite. No big deal. If a Chinese diplomat should say to his parking valet, "please get car", I don't imagine that the valet would interpret that as "any car" or "a car of your choosing". But if he showed up a few moments later with someone else's car, then we observers might definitely conclude there was "a failure to communicate," as the Boss says in Cool Hand Luke. If this misunderstanding were to persist, there might be a good practical reason for our diplomat to start using, "please get THE car" so that the valet knows which car is being referred to. But I imagine a diplomat would also think his function as a diplomat would be best served if he were well-versed in English and did not omit an article where English speakers would use one. It would enhance his job security. But even in that case, the controlling variable is probably not what our diplomat thinks about his English or his use of an article in a sentence. The controlling variable is how listeners respond to his English. A computer model that merely mimics human speech structure is rigged. How is it suppose to know whether I am referring to "a car" or "the car"? How is it to know my intention? The BIG trick we haven't reproduced is the one the UN parking valet performs. He knows that "please get car" refers to a specific car. And he only knows that because he can rule out the possibility that our diplomat means just any car. And the reason he knows that has more to do with the rules of car ownership and parking garages than it has to do with the rules of language. The real function of language is nearly always extra-linguistic. The difference between "the" and "a" is most often determined out there in the real world, not in the closed loop of structural linguistics. The consequence of omitting an article in English or misusing one has more to do with what will happen the next time than the rules of grammar. If "please get car" impresses on the valet that we are important foreign diplomats and yields quicker service, we may just keep using it -- even if we are not Chinese diplomats. We shouldn't be fooled into thinking that, because we expect people to speak grammatically and they respond, that arbitrary grammar rules are somehow built into us. On the other hand, where grammar rules have clear communication advantages, that should be enough to explain them. Regards, Steve Long From Salinas17 at aol.com Mon Aug 30 15:16:03 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 11:16:03 EDT Subject: "the" (2) Message-ID: In a message dated 8/29/04 4:50:52 PM, jrubba at calpoly.edu writes: << Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. >> The use of "the" before proper names is something I heard in the midwest years ago and it showed up in for example Al Franken's impression of the Minnesota rabbi giving his ecumenical approval to songs about "the Santa Claus". David Letterman often drifted into it when he was getting colloquial ("How many of you have tried the Popeye's string beans, huh?"). I always took it as being somehow from the German usage. The usage makes an unusual appearance in a current Burger King commercial where a motivational-type spokesperson with a British accent says "the Burger King" in referring to the chain -- probably the first time that has happened in a national fast food commercial. I heard it also in the sports nicknaming youth jargon of the 80's ("the Stevester"). It strikes me that it made a startling appearance in the title of Mel Gibson's movie -- "The Passion of the Christ." In all of these cases, what its function appears to be is to take a proper name and elevate it to a categorical "status." Regards, Steve Long From hstahlke at bsu.edu Mon Aug 30 15:18:56 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 10:18:56 -0500 Subject: "the" (2) Message-ID: I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. Herb Stahlke In a message dated 8/29/04 4:50:52 PM, jrubba at calpoly.edu writes: << Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. >> The use of "the" before proper names is something I heard in the midwest years ago and it showed up in for example Al Franken's impression of the Minnesota rabbi giving his ecumenical approval to songs about "the Santa Claus". David Letterman often drifted into it when he was getting colloquial ("How many of you have tried the Popeye's string beans, huh?"). I always took it as being somehow from the German usage. The usage makes an unusual appearance in a current Burger King commercial where a motivational-type spokesperson with a British accent says "the Burger King" in referring to the chain -- probably the first time that has happened in a national fast food commercial. I heard it also in the sports nicknaming youth jargon of the 80's ("the Stevester"). It strikes me that it made a startling appearance in the title of Mel Gibson's movie -- "The Passion of the Christ." In all of these cases, what its function appears to be is to take a proper name and elevate it to a categorical "status." Regards, Steve Long From rmalouf at mail.sdsu.edu Mon Aug 30 15:22:27 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 08:22:27 -0700 Subject: The Chinese Diplomat's "the" In-Reply-To: Message-ID: Hi, On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: > In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: > << At any rate, the performance of the best models is getting close to > that > of humans at guessing which article will be used in a given context. >> > > There's an irony to why one sees such adherence to structuralist > criteria on > the "functional" linguistics list. In most situations, of course, a > computer > model cannot possibly predict the use of "the" versus "a" unless it > also reads > minds. It's hard for me to imagine anything less "structuralist" than an instance-based model like this one. The system produces an article for a sequence like "please get ___ car" by searching a reference corpus for similar patterns. If it finds sequences like "please get the car" more often than "please get a car" or "please get car", it produces a "the". The amazing thing is that this actually works! If we take a corpus, strip out all the articles, and use the system to try to recover them, it's right almost 85% of the time. This can be further improved somewhat by providing the system with an ontology of noun meanings (so it can draw generalizations about words which don't occur in the reference corpus but have very similar meanings to words which do). No, it's never going to be right 100% of the time, at least until we can read minds, but in most situations, very simple information about the context is all that's needed. A system like this has obvious applications for machine translation, but the reason we first got to thinking about this problem was in the context of an adaptive communication system. We were working with an ALS patient who was completely paralyzed: he couldn't speak, move, or even breathe on his own, but by moving his eyes he could spell out simple messages. This was very fatiguing for him, and the messages tended to be highly telegraphic: "please get the car" might well come out as "ge cr". His family could understand what he meant, but no one else could. This program for generating articles was part of a larger system to "translate" things like "ge cr" into fluent, polite English: "please get the car". You might think that this could only be done reliably with full mind reading ability and/or a vast store of general world knowledge, and it's easy to make up isolated examples where that's true. But, it turns out that in real life it can be done remarkably well using very simple tricks. So, yeah, if he'd ever wanted to tell a valet to "please get a car", the system would have inserted an unwanted "the". Fortunately, hardly anyone ever does that, so the problem doesn't come up very often. --- Rob Malouf rmalouf at mail.sdsu.edu Department of Linguistics and Oriental Languages San Diego State University From Salinas17 at aol.com Mon Aug 30 15:37:45 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 11:37:45 EDT Subject: "the" (2) Message-ID: In a message dated 8/30/04 11:18:46 AM, hstahlke at bsu.edu writes: << I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. >> I'm sure you're correct. But consider how often that theological point could have been made in the media and elsewhere when "Christ" has been mentioned in the past. The departure in the name of a film is striking and may have filtered into other uses or perhaps reflect an on-going trend. I think it is more popularly "understandable" today than it would have been in, say, the '50's in America. And in that sense, the "title" is used to refer to what is usually treated as a proper name and altered to connote a status. In that, it has the same connotation as something as profane as "The Shaq." (Or in a version just recently heard -- "...The Albert Einstein of rap music.") Regards, Steve From hstahlke at bsu.edu Mon Aug 30 15:50:14 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 10:50:14 -0500 Subject: "the" (2) Message-ID: Steve, Given Gibson's background, I suspect he was using "the Christ" in its traditional theological sense, but given the theological background of many of those who made the movie into a cause, I suspect you are right. Herb In a message dated 8/30/04 11:18:46 AM, hstahlke at bsu.edu writes: << I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. >> I'm sure you're correct. But consider how often that theological point could have been made in the media and elsewhere when "Christ" has been mentioned in the past. The departure in the name of a film is striking and may have filtered into other uses or perhaps reflect an on-going trend. I think it is more popularly "understandable" today than it would have been in, say, the '50's in America. And in that sense, the "title" is used to refer to what is usually treated as a proper name and altered to connote a status. In that, it has the same connotation as something as profane as "The Shaq." (Or in a version just recently heard -- "...The Albert Einstein of rap music.") Regards, Steve From Salinas17 at aol.com Mon Aug 30 16:01:59 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 12:01:59 EDT Subject: The Chinese Diplomat's "the" (2) Message-ID: In a message dated 8/30/04 11:22:51 AM, rmalouf at mail.sdsu.edu writes: << So, yeah, if he'd ever wanted to tell a valet to "please get a car", the system would have inserted an unwanted "the". Fortunately, hardly anyone ever does that, so the problem doesn't come up very often. >> "...get a car." It is what I say all the time in reference to rental cars at the airport. And guys like Tony Soprano might say it with regard to the cars they want gotten. You're working with a limited context. In any case, the actual odds are extra-linguistic. Otherwise they are 50-50 to a machine that knows nothing about the issues of car ownership or how many cars are in the family garage and what options are being offered by saying "get a car" versus "get the car." <> It is completely structural in how it gets to output. That's not to say you are not doing a good thing in practical terms. But the fact that you've found predictability in the patterns of speech doesn't necessarily provide an explanation of those patterns -- other than perhaps we are in the habit of talking about the same things for the same reasons in the same ways from day to day. If your "speaker" was misunderstood despite the machine being accurate, that would be a "functional" matter. If function of speech is communication, we can presume that a variety of structures might acheive the same understanding -- e.g., "get the car [I want to go for a ride]" or "get a car [I want to go for a ride]" or "I want to go for a ride". SLong From ellen at central.cis.upenn.edu Mon Aug 30 16:04:14 2004 From: ellen at central.cis.upenn.edu (Ellen F. Prince) Date: Mon, 30 Aug 2004 12:04:14 EDT Subject: The Chinese Diplomat's "the" In-Reply-To: Your message of "Mon, 30 Aug 2004 08:22:27 PDT." <66D6AE42-FA98-11D8-BA73-000D932A40AE@mail.sdsu.edu> Message-ID: R. Malouf writes: >Hi, > >On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: >> In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: >> << At any rate, the performance of the best models is getting close to >> that >> of humans at guessing which article will be used in a given context. >> >> >> There's an irony to why one sees such adherence to structuralist >> criteria on >> the "functional" linguistics list. In most situations, of course, a >> computer >> model cannot possibly predict the use of "the" versus "a" unless it >> also reads >> minds. > >It's hard for me to imagine anything less "structuralist" than an >instance-based model like this one. The system produces an article for >a sequence like "please get ___ car" by searching a reference corpus >for similar patterns. If it finds sequences like "please get the car" >more often than "please get a car" or "please get car", it produces a >"the". > >The amazing thing is that this actually works! If we take a corpus, >strip out all the articles, and use the system to try to recover them, >it's right almost 85% of the time. This can be further improved >somewhat by providing the system with an ontology of noun meanings (so >it can draw generalizations about words which don't occur in the >reference corpus but have very similar meanings to words which do). >No, it's never going to be right 100% of the time, at least until we >can read minds, but in most situations, very simple information about >the context is all that's needed. This may be an attractive solution for producing software for the market -- but it is simply hilarious as any sort of model of how humans use language. Imagine two company robots flying to a remote destination together. One has the kind of software you are describing; the other has human-like competence in the use of articles. After collecting their baggage, the one with your (kind of) software says to the other one, 'I've just realized that we need the car, please.' Being an obedient robot and understanding the request as a human would, the requestee boards the next flight back home, since the only thing s/he/it can infer from _the car_ in this context is their company car... The fact that people typically drive their own car, which is Hearer-known or Inferrable and hence typically definite, more often than a rental car, which can be Hearer-new and hence typically indefinite, is profoundly irrelevant to human language processing/competence -- even if it'll get the software developer safely thru a demo (almost) 85 out of 100 times... And, by the way, to deal with linguistic reference, we only have to 'read minds' as well as the average speaker does -- i.e. not at all. What we need is a large and relevant knowledge-base and a system of plausible reasoning, both needed anyway for other aspects of AI, as well as some form-function correspondences for each language. IOW, we need what languages users have. Ellen Prince From ellen at central.cis.upenn.edu Mon Aug 30 16:13:46 2004 From: ellen at central.cis.upenn.edu (Ellen F. Prince) Date: Mon, 30 Aug 2004 12:13:46 EDT Subject: "the" (2) In-Reply-To: Your message of "Mon, 30 Aug 2004 11:37:45 EDT." <158.3dcf1d6e.2e64a3c9@aol.com> Message-ID: _Christ_ may be considered a title but it's ultimately a common noun (or adjective used as such) meaning 'anointed (one)'. I would imagine that it's that sense that's being emphasized when the article is used, as in Mel Gibson's movie title. Ellen Prince From hstahlke at bsu.edu Mon Aug 30 16:18:11 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 11:18:11 -0500 Subject: "the" (2) Message-ID: Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. Herb -----Original Message----- From: funknet-bounces at mailman.rice.edu [mailto:funknet-bounces at mailman.rice.edu] On Behalf Of Ellen F. Prince Sent: Monday, August 30, 2004 11:14 AM To: funknet at mailman.rice.edu Subject: Re: [FUNKNET] "the" (2) _Christ_ may be considered a title but it's ultimately a common noun (or adjective used as such) meaning 'anointed (one)'. I would imagine that it's that sense that's being emphasized when the article is used, as in Mel Gibson's movie title. Ellen Prince From Salinas17 at aol.com Mon Aug 30 16:41:09 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 12:41:09 EDT Subject: "the" (2) Message-ID: In a message dated 8/30/04 12:18:04 PM, hstahlke at bsu.edu writes: << Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. >> Herb- Was chrio:/christos ever used in Greek before Christianity in the sense of "anoint in consecration" or did it ever appear as a title (proper noun)? In other words, did the translation also carry a new meaning into Greek? I think earlier Hebrew kings were also called "the anointed ones" in Hebrew. Steve From rmalouf at mail.sdsu.edu Mon Aug 30 17:08:36 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 10:08:36 -0700 Subject: The Chinese Diplomat's "the" (2) In-Reply-To: Message-ID: On Mon, 2004-08-30 at 09:01, Salinas17 at aol.com wrote: > In a message dated 8/30/04 11:22:51 AM, rmalouf at mail.sdsu.edu writes: > << So, yeah, if he'd ever wanted to tell a valet to "please get a car", the > system would have inserted an unwanted "the". Fortunately, hardly anyone ever > does that, > so the problem doesn't come up very often. >> > > "...get a car." It is what I say all the time in reference to rental cars at > the airport. And guys like Tony Soprano might say it with regard to the cars > they want gotten. You're working with a limited context. In any case, the > actual odds are extra-linguistic. Why draw a distinction between linguistic and extra-linguistic factors? I thought we were functionalists here! :-) As I said, it's easy to construct examples which confound a system like this. The striking thing is that such examples are fairly rare in actual language use. A very simple program is able to guess the right article for 85% of the common nouns from a sample of the Wall Street Journal. Of the remaining 15%, some of the articles generated by the system would work just as well as the original one in the text, so the actual rate of "wrong" predictions is somewhat less than 15%. And, of the remaining errors, many would be resolved correctly if we just had a larger reference corpus. As a linguist, I think the fact that such an obviously inadequate system performs as well as it does is interesting. Not because it gives us a plausible model of human language processing, but because it gives an empirical measure of just how rare the truly hard cases are. > < instance-based model like this one. The system produces an article for a sequence like > "please get ___ car" by searching a reference corpus for similar patterns.>> > > It is completely structural in how it gets to output. How so? There's no grammar or grammaticality, no rules or categories, no notion of contrastive or complementary distribution. There is a gradient measure of sequence similarity, which I guess is a bit like the structuralist idea of an opposition, but it's not one I would expect Saussure or Bloomfield to endorse. True, the task the system was evaluated on is structuralistish, but that's only because it's easy to measure the results of, and since it's at least as hard as the task we really care about (finding an article which does the right thing in a given context), it gives us an upper bound on the error rate. [Actually, to be honest, if you read the fine print, some notion of category does get smuggled in by the back door in this particular system, but that's not a necessary feature of a memory-based model.] > But the fact that you've found > predictability in the patterns of speech doesn't necessarily provide an > explanation of those patterns -- other than perhaps we are in the habit of talking > about the same things for the same reasons in the same ways from day to day. What more explanation do you need? ;-) -- Rob Malouf Department of Linguistics and Oriental Languages San Diego State University From hstahlke at bsu.edu Mon Aug 30 17:32:59 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 12:32:59 -0500 Subject: "the" (2) Message-ID: Steve, The Septuagint (3rd - 2nd c. BCE) uses christou in ISam12:3 to mean "anointed one". The form shows up regularly in Samuel/Kings. In ISam24:6 David refers to Saul as "the lord's anointed", using to: christo: kyriou. But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. Herb -----Original Message----- From: Salinas17 at aol.com [mailto:Salinas17 at aol.com] Sent: Monday, August 30, 2004 11:41 AM To: FUNKNET at LISTSERV.RICE.EDU Subject: Re: [FUNKNET] "the" (2) In a message dated 8/30/04 12:18:04 PM, hstahlke at bsu.edu writes: << Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. >> Herb- Was chrio:/christos ever used in Greek before Christianity in the sense of "anoint in consecration" or did it ever appear as a title (proper noun)? In other words, did the translation also carry a new meaning into Greek? I think earlier Hebrew kings were also called "the anointed ones" in Hebrew. Steve From language at sprynet.com Mon Aug 30 18:02:31 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 14:02:31 -0400 Subject: Fw: [FUNKNET] extension of "the" Message-ID: ----- Original Message ----- From: "Alexander Gross" To: "Rob Malouf" Sent: Sunday, August 29, 2004 9:42 PM Subject: Re: [FUNKNET] extension of "the" > Thanks, Rob, i've been reading similar literature for the past 25 years & > first discussed this problem in the early 'sixties with my brother-in-law > Morton Astrahan, the IBM VP then in charge of preparing their MT project. > He was pretty sure they'd have most of the bugs ironed out in time for their > demonstration at the NY World's Fair of 1964. > > A lot of this depends on who is doing the reporting. You might want to look > at the following on-line report in the current Translation Journal: > > Machine Translation and Computer-Assisted Translation: a New Way of > Translating? > by Olivia Craciunescu, Constanza Gerding-Salas, and Susan Stringer-O'Keeffe, > > it's at: > > http://www.accurapid.com/journal/29computers.htm > > very best! > > alex > > > ----- Original Message ----- > From: "Rob Malouf" > To: "Alexander Gross" > Cc: > Sent: Sunday, August 29, 2004 6:21 PM > Subject: Re: [FUNKNET] extension of "the" > > > > > > On Aug 29, 2004, at 1:24 PM, Alexander Gross wrote: > > > Since you (singular and plural) imagine that it will one day be > > > possible to construct an "adequate" machine translation system, > > > here is *your* little assignment. It's easy, it's all in English. I > > > want you to come up with the precise, practical rules by which > > > we decide to put "the" in front of a noun as opposed to when > > > we decide to put "a" or "an" in front of a noun as opposed to > > > when we decide to put absolutely nothing ("zero-grade article") > > > in front of a noun. Also: precisely when do we have a choice > > > between two possible methods? > > > > While no one (that I know of) has written such rules, there's been > > considerable work on addressing this problem using machine learning and > > statistical models. For example, this paper reports some early > > experiments: > > > > http://citeseer.ist.psu.edu/minnen00memorybased.html > > > > I know they've improved on these results since then, but I can't find > > the reference off hand. At any rate, the performance of the best > > models is getting close to that of humans at guessing which article > > will be used in a given context. > > --- > > Rob Malouf > > rmalouf at mail.sdsu.edu > > Department of Linguistics and Oriental Languages > > San Diego State University > > > > > From language at sprynet.com Mon Aug 30 18:31:07 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 14:31:07 -0400 Subject: The Chinese Diplomat's "the" Message-ID: > The amazing thing is that this actually works! If we take a corpus, > strip out all the articles, and use the system to try to recover them, > it's right almost 85% of the time. I'm disappointed to see that claims like "it's right almost 85% of the time" are still being advanced by MT advocates. Here's what I had to say about this twelve years ago in my Limitations of Computers as Translation Tools (in Computers in Translation: A Practical Approach, Routledge, 1992): --------------------------------------------- Also often encountered in the literature are percentage claims purportedly grading the efficiency of computer translation systems. Thus, one language pair may be described as `90% accurate' or `95% accurate' or occasionally only `80% accurate.' The highest claim I have seen so far is `98% accurate.' Such ratings may have more to do with what one author has termed spreading `innumeracy' than with any meaningful standards of measurement. On a shallow level of criticism, even if we accepted a claim of 98% accuracy at face value (and even if it could be substantiated), this would still mean that every standard double-spaced typed page would contain five errors--potentially deep substantive errors, since computers, barring a glitch, never make simple mistakes in spelling or punctuation. It is for the reader to decide whether such an error level is tolerable in texts that may shape the cars we drive, the medicines and chemicals we take and use, the peace treaties that bind our nations. As for 95% accuracy, this would mean one error on every other line of a typical page, while with 90% accuracy we are down to one error in every line. Translators who have had to post-edit such texts tend to agree that with percentage claims of 90% or less it is easiest to have a human translator start all over again from the original text. On a deeper level, claims of 98% accuracy may be even more misleading--does such a claim in fact mean that the computer has mastered 98% of perfectly written English or rather 98% of minimally acceptable English? Is it possible that 98% of the latter could turn out to be 49% of the former? There is a great difference between the two, and so far these questions have not been addressed. ----------------------------------------------------- (Full text of this piece available on my website under the Linguistics/MT menu at:) http://language.home.sprynet.com very best to all! alex ----- Original Message ----- From: "Rob Malouf" To: Cc: Sent: Monday, August 30, 2004 11:22 AM Subject: [FUNKNET] Re: The Chinese Diplomat's "the" > Hi, > > On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: > > In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: > > << At any rate, the performance of the best models is getting close to > > that > > of humans at guessing which article will be used in a given context. >> > > > > There's an irony to why one sees such adherence to structuralist > > criteria on > > the "functional" linguistics list. In most situations, of course, a > > computer > > model cannot possibly predict the use of "the" versus "a" unless it > > also reads > > minds. > > It's hard for me to imagine anything less "structuralist" than an > instance-based model like this one. The system produces an article for > a sequence like "please get ___ car" by searching a reference corpus > for similar patterns. If it finds sequences like "please get the car" > more often than "please get a car" or "please get car", it produces a > "the". > > The amazing thing is that this actually works! If we take a corpus, > strip out all the articles, and use the system to try to recover them, > it's right almost 85% of the time. This can be further improved > somewhat by providing the system with an ontology of noun meanings (so > it can draw generalizations about words which don't occur in the > reference corpus but have very similar meanings to words which do). > No, it's never going to be right 100% of the time, at least until we > can read minds, but in most situations, very simple information about > the context is all that's needed. > > A system like this has obvious applications for machine translation, > but the reason we first got to thinking about this problem was in the > context of an adaptive communication system. We were working with an > ALS patient who was completely paralyzed: he couldn't speak, move, or > even breathe on his own, but by moving his eyes he could spell out > simple messages. This was very fatiguing for him, and the messages > tended to be highly telegraphic: "please get the car" might well come > out as "ge cr". His family could understand what he meant, but no one > else could. This program for generating articles was part of a larger > system to "translate" things like "ge cr" into fluent, polite English: > "please get the car". You might think that this could only be done > reliably with full mind reading ability and/or a vast store of general > world knowledge, and it's easy to make up isolated examples where > that's true. But, it turns out that in real life it can be done > remarkably well using very simple tricks. So, yeah, if he'd ever > wanted to tell a valet to "please get a car", the system would have > inserted an unwanted "the". Fortunately, hardly anyone ever does that, > so the problem doesn't come up very often. > --- > Rob Malouf > rmalouf at mail.sdsu.edu > Department of Linguistics and Oriental Languages > San Diego State University > > From jrubba at calpoly.edu Mon Aug 30 19:22:38 2004 From: jrubba at calpoly.edu (Johanna Rubba) Date: Mon, 30 Aug 2004 12:22:38 -0700 Subject: "the" (2) Message-ID: I don't see "the 405" as placement of an article before a proper name. I do believe it is a short form of "the 405 freeway." If you've listened to enough LA radio traffic reports, you hear alternation between the shorter and longer usage. And perhaps people more expert on SoCal usage can chime in as to whether So. Californians use "the" in front of other proper names. I don't have any awareness of such. I do not hear the usages Steve Long reports, e.g. "the Santa Claus" or "the Popeye". As to "the Christ", I'm sure Gibson was using it in the traditional theological sense, this being a very fundamentalist Catholic movie. But somehow I doubt that this film is responsible for the spread of such usages. It's too recent. "The Donald" has been in common use since long before Gibson's film appeared. My intuition tells me that "the" is inserted in such cases as a campy acknowledgment of his (supposed?) uniqueness and fame, as we say "the sun" and "the moon", because we can be sure everyone knows which sun or moon (or Donald) we are talking about. Re British "the Burger King", this has a familiar ring to me. But my memories of British English are too foggy to verify or come up with other examples. Surely there are some Brits out there who subscribe to Funknet ... ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Johanna Rubba Associate Professor, Linguistics English Department, California Polytechnic State University One Grand Avenue • San Luis Obispo, CA 93407 Tel. (805)-756-2184 • Fax: (805)-756-6374 • Dept. Phone. 756-2596 • E-mail: jrubba at calpoly.edu • Home page: http://www.cla.calpoly.edu/~jrubba ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From Salinas17 at aol.com Mon Aug 30 20:06:43 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 16:06:43 EDT Subject: "the" (3) Message-ID: In a message dated 8/30/04 3:23:41 PM, jrubba at calpoly.edu writes: << I don't see "the 405" as placement of an article before a proper name. I do believe it is a short form of "the 405 freeway." If you've listened to enough LA radio traffic reports, you hear alternation between the shorter and longer usage. >> "the 405 freeway" is for most purposes already is a proper name. Like the Hudson (always meaning river not the explorer or the Bay.) There are no other members of the category. It already has separate status from other freeways. The license to shorten it doesn't change that. Chesterton said something about how it was never polite to say "the Queen of England" because it had to be assumed everyone knew what queen you were talking about when you said "the Queen". <> You don't watch enough television. Letterman uses the form a lot. How about "The Shaq?" I heard it during the playoffs. Did you watch the playoffs? Steve Long From david.kronenfeld at ucr.edu Mon Aug 30 20:14:02 2004 From: david.kronenfeld at ucr.edu (David B. Kronenfeld) Date: Mon, 30 Aug 2004 13:14:02 -0700 Subject: "the" (2) Message-ID: Mostly I agree with you. But we do hear or see occasional usage of expressions like "the Donald" or "the Arnold". When used they seem to be a way of being a little cute--and of implying that the person in question has become something of either a caricature or a trademark. And, for my examples, "the Arnold" kind of trails after "the terminator"--but as a way of cutting him down a little, while "the Donald" sort of cuts our supreme trumpeter down a bit while also making clear that we are talking about a business trademark (not just any old "Donald", but "the Donald"). Language remains a moving target and we continue to do funny things with it. Cheers, David At 12:22 PM 8/30/2004, Johanna Rubba wrote: >I don't see "the 405" as placement of an article before a proper name. I >do believe it is a short form of "the 405 freeway." If you've listened to >enough LA radio traffic reports, you hear alternation between the shorter >and longer usage. And perhaps people more expert on SoCal usage can chime >in as to whether So. Californians use "the" in front of other proper >names. I don't have any awareness of such. I do not hear the usages Steve >Long reports, e.g. "the Santa Claus" or "the Popeye". > >As to "the Christ", I'm sure Gibson was using it in the traditional >theological sense, this being a very fundamentalist Catholic movie. But >somehow I doubt that this film is responsible for the spread of such >usages. It's too recent. "The Donald" has been in common use since long >before Gibson's film appeared. My intuition tells me that "the" is >inserted in such cases as a campy acknowledgment of his (supposed?) >uniqueness and fame, as we say "the sun" and "the moon", because we can be >sure everyone knows which sun or moon (or Donald) we are talking about. > >Re British "the Burger King", this has a familiar ring to me. But my >memories of British English are too foggy to verify or come up with other >examples. Surely there are some Brits out there who subscribe to Funknet ... ? > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Johanna Rubba Associate Professor, Linguistics >English Department, California Polytechnic State University >One Grand Avenue • San Luis Obispo, CA 93407 >Tel. (805)-756-2184 • Fax: (805)-756-6374 • Dept. Phone. 756-2596 >• E-mail: jrubba at calpoly.edu • Home page: http://www.cla.calpoly.edu/~jrubba >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David B. Kronenfeld Phone Office 951/827-4340 Department of Anthropology Message 951/827-5524 University of California Fax 951/951-5409 Riverside, CA 92521 email david.kronenfeld at ucr.edu Department: http://Anthropology.ucr.edu/ Personal: http://pages.sbcglobal.net/david-judy/david.html Society for Anthropological Sciences: http://anthrosciences.org/index. From Salinas17 at aol.com Mon Aug 30 20:27:35 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 16:27:35 EDT Subject: "the" (3) Message-ID: In a message dated 8/30/04 1:32:51 PM, hstahlke at bsu.edu writes: << But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. >> In Lidell-Scott, the first Greek references to "anoint as a consecration" are Christian. And I don't see it as a epithet in Greek before Christ. Before that it's mainly about smearing oil on the body or white-washing a house or stucco -- nothing particularly religious. The meaning of "anointing" in Greek seems pretty concrete and mundane at an earlier time. So here it seems is a Greek word that changed drastically in its main meaning when it was used to translate a foreign word. A small lesson perhaps in how new ideas travel as a change in words. Steve From language at sprynet.com Mon Aug 30 20:35:59 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 16:35:59 -0400 Subject: extension of "the" Message-ID: Thanks, Wendy. It will be two weeks before i am back in NYC & can look up Peter Master's contributions on this subject. But so far as i can see from summaries on the web, i don't think he and i would have too many differences here. That's because he is concerned with practical solutions for helping ESL students to learn and not merely airy universalist linguistic cause-mongering or promotions for MT, just as i am concerned with training translators & breaking through to describing how language actually works. So far as i can tell, he maintains that grammar can be usefully taught in ESL courses and breaks down his method for treating articles into beginning, intermediate, and advanced phases (none of which is going to help MT programmers very much). OTOH i believe there is a dispute within the ESL community between those who emphasize teaching grammar & those who just want to start their students talking some form of English. It's probable that most of us who learn foreign languages as adults will never speak them perfectly, whichever course is chosen. My Spanish was good enough 50 years ago to get me a job as a bilingual radio announcer in Madrid for Radio Nacional de Espan~a. And I've boasted that I speak it fluently ever since, which in some ways I do. And I'm also fairly justified in my claim that I can speak, read, translate from, interpret brief dialogs into, and even write (with some help from a native editor) five or six languages (six including both British and American English :-) ). But I'm just now going through the hard slogging of preparing for a conference in Xalapa, Mex., and I'm becoming painfully aware of how "broken" my Spanish really is. But at least I'm aware of it, which means that I can improve it a bit. Contrary to Steve's fantasies that all language can be broken down to Roger Schank-like scenarios involving dialogues with car valets, both grammar and accent really do matter in most languages. very best to all! alex ----- Original Message ----- From: WENDY SMITH To: Alexander Gross Cc: clements ; funknet at mailman.rice.edu ; rronques at indiana.edu Sent: Sunday, August 29, 2004 4:40 PM Subject: Re: [FUNKNET] extension of "the" See work by Peter Master. He did his dissertation (UCLA) on this topic. ----- Original Message ----- From: Alexander Gross Date: Sunday, August 29, 2004 1:24 pm Subject: Re: [FUNKNET] extension of "the" > > > > > Does anyone know of any studies on the extension of the use of > "the".> In her home town (Stafford VA), a student of mine noted > that "the" > > can be used: > > I find it fascinating that anyone would assume that "the" might > have a > "normal" use which could then be subject to extension. And that > there would > be any studies which could conceivably place its usage within any > sort of > normative range at all or explore the possible range of extensions. > > I wonder if this may be just one further offshoot from the > illusion shared > by many linguists that the guiding principles of language have been > discovered, described, and even codified. Or in Steven Pinker's words, > linguists have found "the single mental design underlying" all > languages and > "we all have the same minds." > > Four years ago I issued a challenge not only to all those on the > sci.langUSENET newsgroup concerning this matter, it was in fact a > repeat of > the very challenge I had also issued a few years earlier to one of the > foremost founders of AI, a master mathematician and a name so > eminent as to > require no further airing here (though the curious may discover it by > running a Deja search on the sci.lang archives). > > Neither this expert nor the linguists on sci.lang were able to > come up with > a response to this challenge. I am now readdressing it to my > colleagues on > FUNKNET to discover if they will fare any better with it. > > The challenge went as follows: > > --------------------------------------------- > > Since you (singular and plural) imagine that it will one day be > possible to construct an "adequate" machine translation system, > here is *your* little assignment. It's easy, it's all in English. > I > want you to come up with the precise, practical rules by which > we decide to put "the" in front of a noun as opposed to when > we decide to put "a" or "an" in front of a noun as opposed to > when we decide to put absolutely nothing ("zero-grade article") > in front of a noun. Also: precisely when do we have a choice > between two possible methods? > > Further requirement: the rule or rules you come up with have to work > for ALL instances of putting articles in front of nouns. The rules > should be so fool-proof and logically transparent that we can even > make an expert system paradigm out of them, so that anyone who > needed to know which rule to apply could simply consult the expert > system and find the right answer. You'll need something like this > for that "adequate" MT system--it will be crucial to spell out these > rules for English, especially since some fairly different ones apply > to almost any foreign language you can name. And even languages > without articles as such, like Russian and Chinese, have a few > quirks in this regard, to say nothing of the problems of translating > all these languages into and out of English. Today's most advanced > MT systems get all this wrong as often as right. > > But there's another and even better reason for coming up with a > solution. I've tried this task more than once, so it's more than > an idle > riddle. I was first asked to come up with a solution by a > Chinese senior revisor & computer linguist friend at the UN > translationdepartment who himself had trouble deciding which > article to use. > I was eager to solve it for him, and I was almost certain I could come > up with the solution quite easily. I was also interested because some > of my students in a translator-training course I was then teaching > alsoasked me for the same solution. > > They really needed the answer, because they continually made > mistakes with articles both in their writing and speech, which > made it sound as though all they could manage was "broken English." > And this is what many people think when foreigners get their articles > wrong, either in speech or in translations. But these were perfectly > literate & intelligent people--they just couldn't figure out the rules > for English articles. > > The point here is not merely to come up with the usual explanation > for this problem (which amounts to little more than saying "when > something is definite, it takes the definite article, when something > is indefinite, it...). The point IS to come up with a clear set of > rules that can help foreigners to learn English. And beyond that > can incidentally also serve as the basis for an "adequate" MT program. > > Perhaps you also will make the mistake of supposing--as I did--that > this is a trivial problem. Believe me--it isn't. I had no trouble > coming up with the first two or three rules, but there were still > many inexplicable instances, where I had to say lamely to my > students "Learn the Language." I ended up weaseling out by telling > both my students and my friend at the UN to read the NY Times & > other sources & try to figure out for themselves why "a" or "the" > or neither one is used. As Martin Kay has pointed out, you can > throw all the computing power in the world at MT and still come > up empty. At what point does a trivial problem become an > intractable one? > > -------------------------------- > > Let me reiterate that while this may look like a simple problem, > it isn't. > Using an If, Then, Else logical framework, I tried to build > something like > an expert system that could represent its terms but couldn't truly get > beyond the first few rules. The permissible range for using our > articlesvaries not merely between British and American English but > within our own US > variety according to differences of region, class, education, national > origin, and age. It may even vary between members of the same > family and > over time within the usage of a single individual. > > And we're talking just about English here--imagine the > complexities that > arise when other languages are brought in. And since this is true > for such > an extremely small subset of structural linguistic problems in a > singlelanguage, how much more true must it be for the august, all- > embracing,universalist theory advanced by MIT linguists? To say > nothing of all its > cognitive this and that spinoffs? A French friend tells me the > manual for > French-English conversion of articles looks like a small law book, > whicheven then is sure to have exceptions and omissions. If after > decades of > detailed rule-seeking and measurements and busy work on the "syntactic > structures" of minute language byways our current school of > linguists can't > solve this problem, then what can they solve? > > very best to all! > > alex > > > ----- Original Message ----- > From: "clements" > To: > Cc: > Sent: Friday, August 27, 2004 11:47 AM > Subject: [FUNKNET] extension of "the" > > > > Dear Funknetters, > > Does anyone know of any studies on the extension of the use of > "the". In > > her home town (Stafford VA), a student of mine noted that "the" > can be > > used: > > > > --With most acronyms > > I have the AOL. > > She has the SARS. > > > > --With generics > > I like the coffee/the candy. (to refer to all coffee or candy) > > > > --With many proper place names. These tend to be specific > references,> especially the store names. If my friend told me she > was going to "the > > Pier 1," I would understand that she meant the Pier 1 in Central > Park.> We are going to the Nashville. > > I'm in the Target. > > He bought it at the Pier 1. > > > > I have heard it reported with abstract nouns, as in > > > > I have the diabetes > > > > and a colleague of mine in Fort Wayne IN reported hearing it > from his > > students. > > > > Any leads would be most welcome. If there's interest, I'll > write up a > > summary. > > > > Clancy Clements > > > > > > > > > > > From rmalouf at mail.sdsu.edu Mon Aug 30 22:37:20 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 15:37:20 -0700 Subject: The Chinese Diplomat's "the" In-Reply-To: <013001c48ebf$850a02a0$1b999c04@user1sznx2zyoc> Message-ID: On Mon, 2004-08-30 at 11:31, Alexander Gross wrote: > > The amazing thing is that this actually works! If we take a corpus, > > strip out all the articles, and use the system to try to recover them, > > it's right almost 85% of the time. > > I'm disappointed to see that claims like "it's right almost 85% of the time" > are still being advanced by MT advocates. I'm no MT advocate -- my personal feeling is that MT is impossible, but there are enough people smarter than me who disagree that I hesitate to say that in public. The original motive for the paper that I cited was an adaptive communication device. It had nothing to do with MT. And, in case I didn't make it clear, the "right almost 85% of the time" was for a narrowly defined task, namely recovering omitted articles in monolingual English texts. For that task, according to the results they published, it really is right almost 85% of the time. Unless you are accusing the authors of fraud, I don't see there is any evidence of "innumeracy" here, spreading or otherwise. Absolutely no claims are being made about MT, or how well this program would perform as a component of an MT system, or really even whether a program like this is useful for anything. However, I am making the claim based on this paper (though the authors might not endorse it) that most of the time selecting which article to use in a given context isn't very hard. -- Rob Malouf Department of Linguistics and Oriental Languages San Diego State University From hdls at unm.edu Mon Aug 30 23:01:14 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Mon, 30 Aug 2004 17:01:14 -0600 Subject: Final Call for HDLS-6 Conference (Nov. 4-6, 2004) Message-ID: The Sixth High Desert International Linguistics Conference will be held at the University of New Mexico, Albuquerque, NM, November 4 -6, 2004. The invited keynote speakers are Joan Bybee (University of New Mexico), David McNeill (University of Chicago), and Suzanne Kemmer (Rice University). We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From hstahlke at bsu.edu Tue Aug 31 02:05:21 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 21:05:21 -0500 Subject: "the" (3) Message-ID: I agree overall with your analysis. However, I've checked Thayer's Greek-English Lexicon of the New Testament, which also includes Septuagint and Hellenistic sources. ho christos shows up all over the S in its "anoint as consecration meaning". Ps.114:15, Ps.2:2, Hab.3:13, all over Samuel/Kings/Chronicles, and even in reference to a foreign king, Cyrus, in Is.45:1. The word did not have this meaning in pre-Christian non-Jewish writing, but pre-Christian Hellenistic Judaism did extend the secular meaning to its sacred needs, antedating and perhaps establishing NT usage a couple of centuries earlier. Herb Subject: Re: [FUNKNET] "the" (3) In a message dated 8/30/04 1:32:51 PM, hstahlke at bsu.edu writes: << But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. >> In Lidell-Scott, the first Greek references to "anoint as a consecration" are Christian. And I don't see it as a epithet in Greek before Christ. Before that it's mainly about smearing oil on the body or white-washing a house or stucco -- nothing particularly religious. The meaning of "anointing" in Greek seems pretty concrete and mundane at an earlier time. So here it seems is a Greek word that changed drastically in its main meaning when it was used to translate a foreign word. A small lesson perhaps in how new ideas travel as a change in words. Steve From language at sprynet.com Tue Aug 31 07:59:47 2004 From: language at sprynet.com (Alexander Gross) Date: Tue, 31 Aug 2004 03:59:47 -0400 Subject: The Chinese Diplomat's "the" Message-ID: Good, Rob, glad to hear you think it's impossible, though that's probably not the whole story either, and as the source i cited mentioned, all the work that has been done (& all the billions of $ spent so far) could end up helping translators to work more efficiently, though CAT & TM already do this. The real kicker is that even if they finally perfect MT, the only people who will be able to handle the system & make corrections will end up being human translators, or at least those human translators willing to work with it. No, i'm certainly not accusing the authors of fraud. But i do have to tell you that there have been some genuine instances of fraudulent demos in this field, documented back in the 80s in the pages of Language Technology (the precursor of WIRED Magazine), which i wrote for at the time. Also, on one occasion three of us, the UN's MT & Terminology expert, the president of the NY Circle of Translators, and myself, had no choice but to show up for a press conference promoting a blatantly fraudulent MT system. Fortunately the press managed to figure it out for themselves, and we didn't have to say very much. The person behind that system just might be one of those people you feel is smarter than you--or perhaps the teacher of some of them. What's more, all through the late 80s one MT company ran ads promising that with their system monolinguals will perform "truly automatic translation .....without assistance from bilinguals, polyglots or post-editors.....but meeting the quality standards of professional translators-no less." That guy is still quite active in the field but now promises no more than further improvements in TM (Translation Memory). > I'm no MT advocate -- my personal feeling is that MT is impossible, but > there are enough people smarter than me who disagree that I hesitate to > say that in public. The original motive for the paper that I cited was > an adaptive communication device. It had nothing to do with MT. And, > in case I didn't make it clear, the "right almost 85% of the time" was > for a narrowly defined task, namely recovering omitted articles in > monolingual English texts. For that task, according to the results they > published, it really is right almost 85% of the time. Unless you are > accusing the authors of fraud, I don't see there is any evidence of > "innumeracy" here, spreading or otherwise. > > Absolutely no claims are being made about MT, or how well this program > would perform as a component of an MT system, or really even whether a > program like this is useful for anything. However, I am making the > claim based on this paper (though the authors might not endorse it) that > most of the time selecting which article to use in a given context isn't > very hard. There i certainly agree with you. But remember, it isn't very hard for you & me, but it's bewilderingly difficult for many ESL & translator-training students. Anyway, 85% still isn't going to cut it, and i can't help wondering if they asked their system to choose only between definite and indefinite articles, in which case the law of averages would already credit both alternatives with 50%. Even if they allowed for zero grade articles, that would still give all three alternatives a 33% free boost before the test went further. very best! alex > -- > Rob Malouf > Department of Linguistics and Oriental Languages > San Diego State University > > > From Salinas17 at aol.com Tue Aug 31 14:40:55 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Tue, 31 Aug 2004 10:40:55 EDT Subject: The Chinese Diplomat's "the" (3) Message-ID: In a message dated 8/30/04 4:37:17 PM, language at sprynet.com writes: << Contrary to Steve's fantasies that all language can be broken down to Roger Schank-like scenarios involving dialogues with car valets, both grammar and accent really do matter in most languages. >> Well, obviously a problem with my scenario would be that it gave Alex the impression that I was saying grammar and accent don't matter. (Reminded me of one of the more memorable Roger Schank lines: "People don't remember what you say. They remember what they say.") One of my points was that there are actually two different kinds of "bad grammar." There's one kind that makes my speech incomprehensible to listeners. There's another kind that sounds wrong "grammatically" but is nevertheless understandable by listeners. (Time for more scenarios.) A child recently told me that he "waked up in the morning..." I corrected him but understood what he was saying. That's bad grammar that doesn't directly interfere with communication, except to the extent that it distracts or affects the willingness of the listener to listen. However, the Chinese diplomat scenario appears to teach us that whether grammar is faulty can often depend on non-linguistic factors (i.e., whether the embassy owns many cars or just one car -- ie, "get a car" or "get the car"). Some sociolinguists have had a habit of calling these non-linguistic factors "context", in the sense of surrounding circumstances. But the fact is they are the core reason we are speaking in the first place. If our diplomat has no interest in cars, he should logically have nothing to say and the correct article and other grammar problems do not arise. What Rob originally wrote was: "At any rate, the performance of the best [computer] models is getting close to that of humans at guessing which article will be used in a given context." What I was challenging in that statement was how a computer could know "context" -- the non-linguistic ingredients in the soup. From what I can tell, the computer thinks "get the car" is more likely than "get a car" because "get the car" or something like it has been more likely in the past. This is not "context" in the sense of reference, which involves non-linguistic factors. It's "context" in the sense of word sequence and adjacency history and contraints on sentence structure. That's an important difference in terminology and one I thought worth mentioning. It seems to confuse the computer generated language issues a lot. Particularly because "a car" versus "the car" is NOT always a matter that can be solved without looking outside language and in the real world. The parking valet teaches us that. A machine cannot solve that problem on its own. It just doesn't know whether " a car" or "the car" is correct in that circumstance. It doesn't know whether the diplomat should choose one or the other. And of course we can't say which is correct unless we also have such knowledge. Alex also writes: <<... just as i am concerned with ...breaking through to describing how language actually works. >> Let me suggest a place to start. A friend recently received a phone message from a colleague with a strong Southern accent. She and I could make out at best five words out of two dozen. We're all competent native English speakers, but the message to us was incomprehensible. That's an example of when language "actually doesn't work" though it should. Let me suggest that explaining why it didn't work might go a long way towards explaining how it works, when it does work. BTW, there's a humorous piece on the web about "the THE" by Peter Master at: http://aaal.lang.uiuc.edu/letter/23.2/theology.html Regards, Steve Long From hdls at unm.edu Mon Aug 2 18:54:32 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Mon, 2 Aug 2004 12:54:32 -0600 Subject: Second call for the HDLS-6 Linguistics Conference (Nov. 4-6, 2004) at the Univ of New Mexico Message-ID: Please note that the conference will take place from November 4th to the 6th and not the 3rd - 5th as indicated in the first posting. -------------------------------------------------- The Sixth High Desert International Linguistics Conference will be held at the University of New Mexico, Albuquerque, NM, November 4 -6, 2004. We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From spike at darkwing.uoregon.edu Wed Aug 4 20:52:00 2004 From: spike at darkwing.uoregon.edu (Spike Gildea) Date: Wed, 4 Aug 2004 13:52:00 -0700 Subject: Call for Papers Message-ID: 79th Annual Meeting of the Linguistic Society of America 6-9 January 2005 San Francisco, CA Contact: MReynolds at lsadc.org Further information: http://www.lsadc.org Abstract Deadline: 1 SEPTEMBER 2004 Meeting Description The 79th Annual Meeting of the Linguistic Society of America will be held at the Hyatt Regency San Francisco, 6-9 January 2005. The American Dialect Society, the American Name Society, the North American Association for the History of the Language Sciences, the Society for Pidgin and Creole Linguistics, and the Society for theStudy of the Indigenous Languages of the Americas will meet concurrently with the LSA. On the program will be plenary presentations by Penny Eckert, Victor Golla, Peter Ladefoged, and George Lakoff. The Presidential Address will be given by Joan Bybee. Call for Papers All members of the LSA are invited to submit abstracts for 15-minute, 30-minute, and poster presentations. Membership is a requirement for submitting and presenting; dues for 2004 ($80 US regular, $40 US student; $100 non-US regular, $60 non-US student) may accompany submissions. Submittal forms and the guidelines and specifications for abstracts may be found in the June LSA Bulletin or at http://www.lsadc.org. The guidelines for abstract preparation must be rigorously adhered to for the abstracts to be considered by the Program Committee. The deadline for receipt of all abstracts is 1 September 2004, 5:00 PM EDT. Submissions should be addressed to: LSA Secretariat, 1325 18th Street, NW, Suite 211, Washington, DC 20036-6501. Members are advised that post office delivery, including express mail and priority mail, is erratic. Abstracts received after the deadline will not be considered and will be returned to the authors. Strict enforcement of this deadline is necessary. From francisco.ruiz at dfm.unirioja.es Fri Aug 6 00:02:22 2004 From: francisco.ruiz at dfm.unirioja.es (Francisco Ruiz de Mendoza) Date: Fri, 6 Aug 2004 01:02:22 +0100 Subject: ARCL-3 Call for papers Message-ID: The Annual Review of Cognitive Linguistics (published under the auspices of the Spanish Cognitive Linguistics Association) aims to establish itself as an international forum for the publication of high-quality original research on all areas of linguistic enquiry from a cognitive perspective. Fruitful debate is encouraged with neighboring academic disciplines as well as with other approaches to language study, particularly functionally-oriented ones. Submissions for ARCL-3 (2005) should be received before November 30, 2004. For submission guidelines, visit: http://www.unirioja.es/dptos/dfm/ARCLGuidelines.pdf For general information, go to: http://www.benjamins.com/cgi-bin/t_seriesview.cgi?series=ARCL Francisco J. RUIZ DE MENDOZA- Editor Universidad de La Rioja Departamento de Filolog?as Modernas Edificio de Filolog?a c/San Jos? de Calasanz s/n Campus Universitario 26004, Logro?o, La Rioja, Spain Tel.: 34 (941) 299433 / (941) 299430 FAX.: 34 (941) 299419 e-mail: fran cisco.ruiz at dfm.unirioja.es From Salinas17 at aol.com Fri Aug 20 13:44:49 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 09:44:49 EDT Subject: On the Relativity Front Message-ID: COGNITION: Life Without Numbers in the Amazon (p. 1093) --------------------------------------------------------------------------- Constance Holden In an article published online this week by Science (www.sciencemag.org/cgi/content/abstract/1094492), a psycholinguist demonstrates that among members of a tiny tribe in the Amazon jungle that has no words for numbers beyond two, the ability to conceptualize numbers is no better than it is among pigeons, chimps, or human infants. Full story at http://www.sciencemag.org/cgi/content/full/305/5687/1093a?etoc ------------------------------------ But of course this not eliminate the possibility that the number two is one of those prelinguistic schema. Regards, Steve Long From Salinas17 at aol.com Fri Aug 20 15:55:37 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 11:55:37 EDT Subject: Fwd: [FUNKNET] On the Relativity Front Message-ID: From Salinas17 at aol.com Fri Aug 20 19:19:13 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Fri, 20 Aug 2004 15:19:13 EDT Subject: FWD: On the Relativity Front (2) Message-ID: Forwarded Message: Subj: Re: [FUNKNET] On the Relativity Front Date: Friday, August 20, 2004 9:56:04 AM From: daniel.everett at uol.com.br To: Salinas17 at aol.com cc: dan.everett at man.ac.uk, funknet at rice.edu From: daniel.everett at uol.com.br (Daniel Everett) To: Salinas17 at aol.com CC: dan.everett at man.ac.uk (Daniel L.Everett), funknet at rice.edu If anyone wants to discuss this case, about Piraha, further. I would be happy to do so. Peter Gordon did this work with me, originally going to run experiments to check out and refine my own anecdotes about lack of counting and numbers in Piraha. Gordon's conclusion is Whorfian. However, I disagree with this. In a larger article on my website - 'Cultural Constraints on Grammar and Cognition in Piraha', I discuss the number findings in a larger context, arguing that there are several ways in which Piraha culture constrains grammar (Piraha is also, I claim, the only language without embedding, for example, it lacks color words, it has the simplest kinship system known, the smallest phonemic inventory, etc.). I offer a single account for all of this based on the cultural constraint against talking about things outside the immediate experience of members of the community. Dan On 20 Aug 2004, at 10:44, Salinas17 at aol.com wrote: > COGNITION: Life Without Numbers in the Amazon (p. 1093) > ----------------------------------------------------------------------- > ---- > Constance Holden > > In an article published online this week by Science > (www.sciencemag.org/cgi/content/abstract/1094492), a psycholinguist > demonstrates that among members of a tiny tribe in the Amazon jungle > that > has no words for numbers beyond two, the ability to conceptualize > numbers > is no better than it is among pigeons, chimps, or human infants. > Full story at > http://www.sciencemag.org/cgi/content/full/305/5687/1093a?etoc > ------------------------------------ > But of course this not eliminate the possibility that the number two > is one > of those prelinguistic schema. > > Regards, > Steve Long > > ------------------------------- Daniel L. Everett Professor of Phonetics and Phonology Postgraduate Programme Director Department of Linguistics and English Language University of Manchester Manchester M13 9PL UK Fax: 44 161 275 3187 Phone: 44 161 275 3158 http://ling.man.ac.uk/info/staff/DE/DEHome.html From sathomps at linguistics.ucsb.edu Sat Aug 21 21:03:13 2004 From: sathomps at linguistics.ucsb.edu (Sandra Thompson) Date: Sat, 21 Aug 2004 14:03:13 -0700 Subject: UCSB job: computational/corpus linguistics Message-ID: The Linguistics Department of the University of California, Santa Barbara seeks to hire a specialist in computational and/or corpus linguistic approaches to language. The appointment will be tenure-track at the Assistant Professor level, effective July 1, 2005. We are especially interested in candidates whose research shows theoretical implications bridging computational and/or corpus linguistics and general linguistics, and who can interact with colleagues and students across disciplinary boundaries at UCSB. Candidates will be preferred whose research engages with the departmental focus on functional and usage-based approaches to explaining language. Research experience with corpora of naturally occurring language use is required. Candidates must have demonstrated excellence in teaching, and will be expected to teach a range of graduate and undergraduate courses in both computational/corpus linguistics and general linguistics. Ph.D. in linguistics or a related field such as cognitive science or computer science is required. Ph.D. normally required by the time of appointment. Applicants should submit hard copy of curriculum vitae, statement of research interests, 1-2 writing samples, and full contact information for three academic references to the Search Committee, Linguistics Department, UCSB, Santa Barbara, CA 93106-3100. Fax and email applications not accepted. Inquiries may be addressed to the above address or via email to lingsearch at linguistics.ucsb.edu. Tentative deadline is November 12, 2004. However, the position will remain open until filled. Preliminary interviews will be conducted at the Linguistic Society of America, although attendance is not required for consideration. The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching and service. UCSB is an Equal Opportunity/Affirmative Action employer. From hdls at unm.edu Wed Aug 25 00:07:33 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Tue, 24 Aug 2004 18:07:33 -0600 Subject: HDLS-6 Linguistics Conference (Nov. 4-6, 2004) Keynote speakers Message-ID: We are very pleased to announce that Joan Bybee (University of New Mexico), David McNeil (University of Chicago), and Suzanne Kemmer (Rice University) have accepted our invitations to deliver keynote addresses at The Sixth High Desert Linguistics Conference November 4 -6, 2004 at the University of New Mexico, Albuquerque, NM . We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From Julia.Ulrich at degruyter.com Wed Aug 25 09:05:05 2004 From: Julia.Ulrich at degruyter.com (Julia Ulrich) Date: Wed, 25 Aug 2004 11:05:05 +0200 Subject: Inaugural Issue of Intercultural Pragmatics is now available Message-ID: Mouton de Gruyter is proud to announce the publication of INTERCULTURAL PRAGMATICS Edited by Istvan Kecskes The articles of the first issue are available as free downloads at http://www.degruyter.de/journals/intcultpragm/intcultpragm1_1.html ISTVAN KECSKES Editorial: Lexical merging, conceptual blending, and cultural crossing JACOB L. MEY Between culture and pragmatics: Scylla and Charybdis? The precarious condition of intercultural pragmatics JACQUES MOESCHLER Intercultural pragmatics: a cognitive approach BERT PEETERS ''Thou shalt not be a tall poppy'': Describing an Australian communicative (and behavioral) norm Interview J?ZSEF ANDOR The master and his performance: An interview with Noam Chomsky Forum RACHEL GIORA On the Graded Salience Hypothesis GABRIELE KASPER Speech acts in (inter)action: Repeated questions JAN NUYTS The cognitive-pragmatic approach. Contributors to this issue For more information, please visit http://www.degruyter.de/rs/384_7078_ENU_h.htm To order please contact SFG Servicecenter-Fachverlage Postfach 4343 72774 Reutlingen, Germany Fax: +49 (0)7071 - 93 53 - 33 E-mail: deGruyter at s-f-g.com For USA, Canada and Mexico: Walter de Gruyter, Inc. 200 Saw Mill River Road Hawthorne, NY 10532, USA Fax: +1 (914) 747-1326 E-mail: cs at degruyterny.com Please visit our website for other publications by Mouton de Gruyter: http://www.mouton-publishers.com __________________________________________________________________________________________________________________________ Diese E-Mail und ihre Dateianhaenge ist fuer den angegeben Empfaenger und/oder die Empfaengergruppe bestimmt. Wenn Sie diese E-Mail versehentlich trotzdem erhalten haben, setzen Sie sich bitte mit dem Absender oder Ihrem Systembetreuer in Verbindung. Diese Fusszeile bestaetigt ausserdem, dass die E-Mail auf zum Pruefzeitpunkt bekannte Viren ueberprueft wurde. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender or the system manager. This footnote also confirms that this email message has been swept for the presence of computer viruses. From geoffnathan at wayne.edu Wed Aug 25 13:26:04 2004 From: geoffnathan at wayne.edu (Geoff Nathan) Date: Wed, 25 Aug 2004 09:26:04 -0400 Subject: Phonology Theme Session--Reminder Message-ID: Just a reminder that the deadline for abstract submission is rapidly approaching... Phonology in the Cognitive Grammar Worldview a theme session within the Ninth International Cognitive Linguistics Conference Yonsei University in Seoul, Korea. 17-22nd of July 2005 We are seeking abstracts for papers exploring how fundamental principles of Cognitive Grammar (prototype theory, experiential grounding--?embodiment?, principles of categorization, including the concept of the ?basic level,? and usage-based theories) can elucidate the organization of phonology in Language (either spoken or signed). We invite abstracts from researchers in all areas of cognitive linguistics and related frameworks who are interested in the way the purely form-oriented, physical aspect of language is perceived, categorized, organized and produced. Abstracts should be between 300 and 500 words, and clearly display their relevance to the topic. Abstracts should be submitted electronically (RTF or PDF format) to Geoffrey Nathan (geoffnathan at wayne.edu) and Jos? Antonio Mompe?n,(mompean at um.es), and should reach us no later than August 30. Authors will be notified by September 5th whether their abstracts have been selected for the theme session. The theme session proposal will then be submitted to the organizers of the ICLC-9, who will notify us of acceptance or rejection by January 15th. [apologies for cross-posting] Geoffrey S. Nathan Faculty Liaison, Computing and Information Technology, and Associate Professor of English Linguistics Program Phone Numbers Department of English Computing and Information Technology: (313) 577-1259 Wayne State University Linguistics (English): (313) 577-8621 Detroit, MI, 48202 C&IT Fax: (313) 577-1338 From clements at indiana.edu Fri Aug 27 15:47:19 2004 From: clements at indiana.edu (clements) Date: Fri, 27 Aug 2004 10:47:19 -0500 Subject: extension of "the" Message-ID: Dear Funknetters, Does anyone know of any studies on the extension of the use of "the". In her home town (Stafford VA), a student of mine noted that "the" can be used: --With most acronyms I have the AOL. She has the SARS. --With generics I like the coffee/the candy. (to refer to all coffee or candy) --With many proper place names. These tend to be specific references, especially the store names. If my friend told me she was going to "the Pier 1," I would understand that she meant the Pier 1 in Central Park. We are going to the Nashville. I'm in the Target. He bought it at the Pier 1. I have heard it reported with abstract nouns, as in I have the diabetes and a colleague of mine in Fort Wayne IN reported hearing it from his students. Any leads would be most welcome. If there's interest, I'll write up a summary. Clancy Clements From john at research.haifa.ac.il Sat Aug 28 03:56:06 2004 From: john at research.haifa.ac.il (john at research.haifa.ac.il) Date: Sat, 28 Aug 2004 06:56:06 +0300 Subject: extension of "the" Message-ID: Dear Clancy, This is completely off the top of my head, but I noticed that 'the' is used with chain stores in Michigan ('the Kroger's') but not in the northeast (at least 10 years ago). In Philadelphia in the 1980's Blacks but not Whites said 'the AIDS'. John Myhill Quoting clements : > Dear Funknetters, > Does anyone know of any studies on the extension of the use of "the". In > her home town (Stafford VA), a student of mine noted that "the" can be > used: > > --With most acronyms > I have the AOL. > She has the SARS. > > --With generics > I like the coffee/the candy. (to refer to all coffee or candy) > > --With many proper place names. These tend to be specific references, > especially the store names. If my friend told me she was going to "the > Pier 1," I would understand that she meant the Pier 1 in Central Park. > We are going to the Nashville. > I'm in the Target. > He bought it at the Pier 1. > > I have heard it reported with abstract nouns, as in > > I have the diabetes > > and a colleague of mine in Fort Wayne IN reported hearing it from his > students. > > Any leads would be most welcome. If there's interest, I'll write up a > summary. > > Clancy Clements > > > > ------------------------------------------------------ This mail sent through IMP Webmail of Haifa University http://webmail.haifa.ac.il From language at sprynet.com Sun Aug 29 20:24:33 2004 From: language at sprynet.com (Alexander Gross) Date: Sun, 29 Aug 2004 16:24:33 -0400 Subject: extension of "the" Message-ID: > Does anyone know of any studies on the extension of the use of "the". > In her home town (Stafford VA), a student of mine noted that "the" > can be used: I find it fascinating that anyone would assume that "the" might have a "normal" use which could then be subject to extension. And that there would be any studies which could conceivably place its usage within any sort of normative range at all or explore the possible range of extensions. I wonder if this may be just one further offshoot from the illusion shared by many linguists that the guiding principles of language have been discovered, described, and even codified. Or in Steven Pinker's words, linguists have found "the single mental design underlying" all languages and "we all have the same minds." Four years ago I issued a challenge not only to all those on the sci.lang USENET newsgroup concerning this matter, it was in fact a repeat of the very challenge I had also issued a few years earlier to one of the foremost founders of AI, a master mathematician and a name so eminent as to require no further airing here (though the curious may discover it by running a Deja search on the sci.lang archives). Neither this expert nor the linguists on sci.lang were able to come up with a response to this challenge. I am now readdressing it to my colleagues on FUNKNET to discover if they will fare any better with it. The challenge went as follows: --------------------------------------------- Since you (singular and plural) imagine that it will one day be possible to construct an "adequate" machine translation system, here is *your* little assignment. It's easy, it's all in English. I want you to come up with the precise, practical rules by which we decide to put "the" in front of a noun as opposed to when we decide to put "a" or "an" in front of a noun as opposed to when we decide to put absolutely nothing ("zero-grade article") in front of a noun. Also: precisely when do we have a choice between two possible methods? Further requirement: the rule or rules you come up with have to work for ALL instances of putting articles in front of nouns. The rules should be so fool-proof and logically transparent that we can even make an expert system paradigm out of them, so that anyone who needed to know which rule to apply could simply consult the expert system and find the right answer. You'll need something like this for that "adequate" MT system--it will be crucial to spell out these rules for English, especially since some fairly different ones apply to almost any foreign language you can name. And even languages without articles as such, like Russian and Chinese, have a few quirks in this regard, to say nothing of the problems of translating all these languages into and out of English. Today's most advanced MT systems get all this wrong as often as right. But there's another and even better reason for coming up with a solution. I've tried this task more than once, so it's more than an idle riddle. I was first asked to come up with a solution by a Chinese senior revisor & computer linguist friend at the UN translation department who himself had trouble deciding which article to use. I was eager to solve it for him, and I was almost certain I could come up with the solution quite easily. I was also interested because some of my students in a translator-training course I was then teaching also asked me for the same solution. They really needed the answer, because they continually made mistakes with articles both in their writing and speech, which made it sound as though all they could manage was "broken English." And this is what many people think when foreigners get their articles wrong, either in speech or in translations. But these were perfectly literate & intelligent people--they just couldn't figure out the rules for English articles. The point here is not merely to come up with the usual explanation for this problem (which amounts to little more than saying "when something is definite, it takes the definite article, when something is indefinite, it...). The point IS to come up with a clear set of rules that can help foreigners to learn English. And beyond that can incidentally also serve as the basis for an "adequate" MT program. Perhaps you also will make the mistake of supposing--as I did--that this is a trivial problem. Believe me--it isn't. I had no trouble coming up with the first two or three rules, but there were still many inexplicable instances, where I had to say lamely to my students "Learn the Language." I ended up weaseling out by telling both my students and my friend at the UN to read the NY Times & other sources & try to figure out for themselves why "a" or "the" or neither one is used. As Martin Kay has pointed out, you can throw all the computing power in the world at MT and still come up empty. At what point does a trivial problem become an intractable one? -------------------------------- Let me reiterate that while this may look like a simple problem, it isn't. Using an If, Then, Else logical framework, I tried to build something like an expert system that could represent its terms but couldn't truly get beyond the first few rules. The permissible range for using our articles varies not merely between British and American English but within our own US variety according to differences of region, class, education, national origin, and age. It may even vary between members of the same family and over time within the usage of a single individual. And we're talking just about English here--imagine the complexities that arise when other languages are brought in. And since this is true for such an extremely small subset of structural linguistic problems in a single language, how much more true must it be for the august, all-embracing, universalist theory advanced by MIT linguists? To say nothing of all its cognitive this and that spinoffs? A French friend tells me the manual for French-English conversion of articles looks like a small law book, which even then is sure to have exceptions and omissions. If after decades of detailed rule-seeking and measurements and busy work on the "syntactic structures" of minute language byways our current school of linguists can't solve this problem, then what can they solve? very best to all! alex ----- Original Message ----- From: "clements" To: Cc: Sent: Friday, August 27, 2004 11:47 AM Subject: [FUNKNET] extension of "the" > Dear Funknetters, > Does anyone know of any studies on the extension of the use of "the". In > her home town (Stafford VA), a student of mine noted that "the" can be > used: > > --With most acronyms > I have the AOL. > She has the SARS. > > --With generics > I like the coffee/the candy. (to refer to all coffee or candy) > > --With many proper place names. These tend to be specific references, > especially the store names. If my friend told me she was going to "the > Pier 1," I would understand that she meant the Pier 1 in Central Park. > We are going to the Nashville. > I'm in the Target. > He bought it at the Pier 1. > > I have heard it reported with abstract nouns, as in > > I have the diabetes > > and a colleague of mine in Fort Wayne IN reported hearing it from his > students. > > Any leads would be most welcome. If there's interest, I'll write up a > summary. > > Clancy Clements > > > > From jrubba at calpoly.edu Sun Aug 29 20:50:17 2004 From: jrubba at calpoly.edu (Johanna Rubba) Date: Sun, 29 Aug 2004 13:50:17 -0700 Subject: "the" Message-ID: Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. I imagine it comes from shortening "the 405 freeway". The areas that do not use the article leave out the word "freeway" and just say "take 405 south ... " Maybe someone else has attested facts on this variation. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Johanna Rubba Associate Professor, Linguistics English Department, California Polytechnic State University One Grand Avenue ? San Luis Obispo, CA 93407 Tel. (805)-756-2184 ? Fax: (805)-756-6374 ? Dept. Phone. 756-2596 ? E-mail: jrubba at calpoly.edu ? Home page: http://www.cla.calpoly.edu/~jrubba ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From adamzero at uchicago.edu Sun Aug 29 20:55:52 2004 From: adamzero at uchicago.edu (adam e leeds) Date: Sun, 29 Aug 2004 15:55:52 -0500 Subject: Underpinnings of functional linguistics? Message-ID: Greetings all, I have a request to make of you, but to preface, a short introductory statement seems to be in order. I'm an undergraduate the University of Chicago, soon to graduate and hopefully soon to enter a graduate program in anthropological linguistics. My interests include, painting with the broad brush, indexicality/deixis, reference maintenance, the dynamics of face-to-face interaction, information structure, reported discourse and cognitive development, anti-realist holist conherentist contextualist theories of mind sign and world, and epistemological issues in the social sciences. My question is a basic one: Can any of you recommend for me good introductory and in depth functional treatments of linguistics (articles and book-length), touching on any or all of: the main tenets, assumptions, arguments for, and structures of, methodological issues, etc. There is a bewildering array of capital-lettered Functional Syntaxes out there, but I don't really know that they are the place to start. Thanks many times over in advance for your responses (which you might want to direct toward me, personally, rather than toward the list). Regards, Adam E. Leeds From rmalouf at mail.sdsu.edu Sun Aug 29 23:13:21 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Sun, 29 Aug 2004 16:13:21 -0700 Subject: extension of "the" In-Reply-To: <003101c48e06$33135730$79999c04@user1sznx2zyoc> Message-ID: On Aug 29, 2004, at 1:24 PM, Alexander Gross wrote: > Since you (singular and plural) imagine that it will one day be > possible to construct an "adequate" machine translation system, > here is *your* little assignment. It's easy, it's all in English. I > want you to come up with the precise, practical rules by which > we decide to put "the" in front of a noun as opposed to when > we decide to put "a" or "an" in front of a noun as opposed to > when we decide to put absolutely nothing ("zero-grade article") > in front of a noun. Also: precisely when do we have a choice > between two possible methods? While no one (that I know of) has written such rules, there's been considerable work on addressing this problem using machine learning and statistical models. For example, this paper reports some early experiments: http://citeseer.ist.psu.edu/minnen00memorybased.html I know they've improved on these results since then, but I can't find the reference off hand. At any rate, the performance of the best models is getting close to that of humans at guessing which article will be used in a given context. --- Rob Malouf rmalouf at mail.sdsu.edu Department of Linguistics and Oriental Languages San Diego State University From Salinas17 at aol.com Mon Aug 30 14:34:05 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 10:34:05 EDT Subject: The Chinese Diplomat's "the" Message-ID: In a message dated 8/29/04 4:25:21 PM, language at sprynet.com writes: << I was first asked to come up with a solution by a Chinese senior revisor & computer linguist friend at the UN translation department who himself had trouble deciding which article to use. >> In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: << At any rate, the performance of the best models is getting close to that of humans at guessing which article will be used in a given context. >> There's an irony to why one sees such adherence to structuralist criteria on the "functional" linguistics list. In most situations, of course, a computer model cannot possibly predict the use of "the" versus "a" unless it also reads minds. If Alex's Chinese diplomats are merely trying to avoid "Broken English", then should we assume that their English otherwise is 100% comprehensible? In other words, is it that they are never misunderstood, but are merely using an inappropriate "ungrammatical" English? And why would that trouble them? What is the consequence of a foreign diplomat speaking understandable but stylistically non-conforming English? Microsoft Word does a pretty good job of correcting inappropriate omission of an article before a singular noun. When I type in, "Will you please get car?", it tells me an article is missing and prompts me to choose between "a car" and "the car". It even tells me that one is definite and one is indefinite. No big deal. If a Chinese diplomat should say to his parking valet, "please get car", I don't imagine that the valet would interpret that as "any car" or "a car of your choosing". But if he showed up a few moments later with someone else's car, then we observers might definitely conclude there was "a failure to communicate," as the Boss says in Cool Hand Luke. If this misunderstanding were to persist, there might be a good practical reason for our diplomat to start using, "please get THE car" so that the valet knows which car is being referred to. But I imagine a diplomat would also think his function as a diplomat would be best served if he were well-versed in English and did not omit an article where English speakers would use one. It would enhance his job security. But even in that case, the controlling variable is probably not what our diplomat thinks about his English or his use of an article in a sentence. The controlling variable is how listeners respond to his English. A computer model that merely mimics human speech structure is rigged. How is it suppose to know whether I am referring to "a car" or "the car"? How is it to know my intention? The BIG trick we haven't reproduced is the one the UN parking valet performs. He knows that "please get car" refers to a specific car. And he only knows that because he can rule out the possibility that our diplomat means just any car. And the reason he knows that has more to do with the rules of car ownership and parking garages than it has to do with the rules of language. The real function of language is nearly always extra-linguistic. The difference between "the" and "a" is most often determined out there in the real world, not in the closed loop of structural linguistics. The consequence of omitting an article in English or misusing one has more to do with what will happen the next time than the rules of grammar. If "please get car" impresses on the valet that we are important foreign diplomats and yields quicker service, we may just keep using it -- even if we are not Chinese diplomats. We shouldn't be fooled into thinking that, because we expect people to speak grammatically and they respond, that arbitrary grammar rules are somehow built into us. On the other hand, where grammar rules have clear communication advantages, that should be enough to explain them. Regards, Steve Long From Salinas17 at aol.com Mon Aug 30 15:16:03 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 11:16:03 EDT Subject: "the" (2) Message-ID: In a message dated 8/29/04 4:50:52 PM, jrubba at calpoly.edu writes: << Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. >> The use of "the" before proper names is something I heard in the midwest years ago and it showed up in for example Al Franken's impression of the Minnesota rabbi giving his ecumenical approval to songs about "the Santa Claus". David Letterman often drifted into it when he was getting colloquial ("How many of you have tried the Popeye's string beans, huh?"). I always took it as being somehow from the German usage. The usage makes an unusual appearance in a current Burger King commercial where a motivational-type spokesperson with a British accent says "the Burger King" in referring to the chain -- probably the first time that has happened in a national fast food commercial. I heard it also in the sports nicknaming youth jargon of the 80's ("the Stevester"). It strikes me that it made a startling appearance in the title of Mel Gibson's movie -- "The Passion of the Christ." In all of these cases, what its function appears to be is to take a proper name and elevate it to a categorical "status." Regards, Steve Long From hstahlke at bsu.edu Mon Aug 30 15:18:56 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 10:18:56 -0500 Subject: "the" (2) Message-ID: I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. Herb Stahlke In a message dated 8/29/04 4:50:52 PM, jrubba at calpoly.edu writes: << Southern Californians are known for their use of "the" in front of freeway numbers: the 5, the 405, the 101, etc. I think this is mostly a Southern Cal. usage; heard less often in the northern half of the state. >> The use of "the" before proper names is something I heard in the midwest years ago and it showed up in for example Al Franken's impression of the Minnesota rabbi giving his ecumenical approval to songs about "the Santa Claus". David Letterman often drifted into it when he was getting colloquial ("How many of you have tried the Popeye's string beans, huh?"). I always took it as being somehow from the German usage. The usage makes an unusual appearance in a current Burger King commercial where a motivational-type spokesperson with a British accent says "the Burger King" in referring to the chain -- probably the first time that has happened in a national fast food commercial. I heard it also in the sports nicknaming youth jargon of the 80's ("the Stevester"). It strikes me that it made a startling appearance in the title of Mel Gibson's movie -- "The Passion of the Christ." In all of these cases, what its function appears to be is to take a proper name and elevate it to a categorical "status." Regards, Steve Long From rmalouf at mail.sdsu.edu Mon Aug 30 15:22:27 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 08:22:27 -0700 Subject: The Chinese Diplomat's "the" In-Reply-To: Message-ID: Hi, On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: > In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: > << At any rate, the performance of the best models is getting close to > that > of humans at guessing which article will be used in a given context. >> > > There's an irony to why one sees such adherence to structuralist > criteria on > the "functional" linguistics list. In most situations, of course, a > computer > model cannot possibly predict the use of "the" versus "a" unless it > also reads > minds. It's hard for me to imagine anything less "structuralist" than an instance-based model like this one. The system produces an article for a sequence like "please get ___ car" by searching a reference corpus for similar patterns. If it finds sequences like "please get the car" more often than "please get a car" or "please get car", it produces a "the". The amazing thing is that this actually works! If we take a corpus, strip out all the articles, and use the system to try to recover them, it's right almost 85% of the time. This can be further improved somewhat by providing the system with an ontology of noun meanings (so it can draw generalizations about words which don't occur in the reference corpus but have very similar meanings to words which do). No, it's never going to be right 100% of the time, at least until we can read minds, but in most situations, very simple information about the context is all that's needed. A system like this has obvious applications for machine translation, but the reason we first got to thinking about this problem was in the context of an adaptive communication system. We were working with an ALS patient who was completely paralyzed: he couldn't speak, move, or even breathe on his own, but by moving his eyes he could spell out simple messages. This was very fatiguing for him, and the messages tended to be highly telegraphic: "please get the car" might well come out as "ge cr". His family could understand what he meant, but no one else could. This program for generating articles was part of a larger system to "translate" things like "ge cr" into fluent, polite English: "please get the car". You might think that this could only be done reliably with full mind reading ability and/or a vast store of general world knowledge, and it's easy to make up isolated examples where that's true. But, it turns out that in real life it can be done remarkably well using very simple tricks. So, yeah, if he'd ever wanted to tell a valet to "please get a car", the system would have inserted an unwanted "the". Fortunately, hardly anyone ever does that, so the problem doesn't come up very often. --- Rob Malouf rmalouf at mail.sdsu.edu Department of Linguistics and Oriental Languages San Diego State University From Salinas17 at aol.com Mon Aug 30 15:37:45 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 11:37:45 EDT Subject: "the" (2) Message-ID: In a message dated 8/30/04 11:18:46 AM, hstahlke at bsu.edu writes: << I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. >> I'm sure you're correct. But consider how often that theological point could have been made in the media and elsewhere when "Christ" has been mentioned in the past. The departure in the name of a film is striking and may have filtered into other uses or perhaps reflect an on-going trend. I think it is more popularly "understandable" today than it would have been in, say, the '50's in America. And in that sense, the "title" is used to refer to what is usually treated as a proper name and altered to connote a status. In that, it has the same connotation as something as profane as "The Shaq." (Or in a version just recently heard -- "...The Albert Einstein of rap music.") Regards, Steve From hstahlke at bsu.edu Mon Aug 30 15:50:14 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 10:50:14 -0500 Subject: "the" (2) Message-ID: Steve, Given Gibson's background, I suspect he was using "the Christ" in its traditional theological sense, but given the theological background of many of those who made the movie into a cause, I suspect you are right. Herb In a message dated 8/30/04 11:18:46 AM, hstahlke at bsu.edu writes: << I'm not sure that "the Christ" fits in with the other examples. The article is used there to make a particular theological point, in part, that "Christ" is not a name but a title, although there's more to it. >> I'm sure you're correct. But consider how often that theological point could have been made in the media and elsewhere when "Christ" has been mentioned in the past. The departure in the name of a film is striking and may have filtered into other uses or perhaps reflect an on-going trend. I think it is more popularly "understandable" today than it would have been in, say, the '50's in America. And in that sense, the "title" is used to refer to what is usually treated as a proper name and altered to connote a status. In that, it has the same connotation as something as profane as "The Shaq." (Or in a version just recently heard -- "...The Albert Einstein of rap music.") Regards, Steve From Salinas17 at aol.com Mon Aug 30 16:01:59 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 12:01:59 EDT Subject: The Chinese Diplomat's "the" (2) Message-ID: In a message dated 8/30/04 11:22:51 AM, rmalouf at mail.sdsu.edu writes: << So, yeah, if he'd ever wanted to tell a valet to "please get a car", the system would have inserted an unwanted "the". Fortunately, hardly anyone ever does that, so the problem doesn't come up very often. >> "...get a car." It is what I say all the time in reference to rental cars at the airport. And guys like Tony Soprano might say it with regard to the cars they want gotten. You're working with a limited context. In any case, the actual odds are extra-linguistic. Otherwise they are 50-50 to a machine that knows nothing about the issues of car ownership or how many cars are in the family garage and what options are being offered by saying "get a car" versus "get the car." <> It is completely structural in how it gets to output. That's not to say you are not doing a good thing in practical terms. But the fact that you've found predictability in the patterns of speech doesn't necessarily provide an explanation of those patterns -- other than perhaps we are in the habit of talking about the same things for the same reasons in the same ways from day to day. If your "speaker" was misunderstood despite the machine being accurate, that would be a "functional" matter. If function of speech is communication, we can presume that a variety of structures might acheive the same understanding -- e.g., "get the car [I want to go for a ride]" or "get a car [I want to go for a ride]" or "I want to go for a ride". SLong From ellen at central.cis.upenn.edu Mon Aug 30 16:04:14 2004 From: ellen at central.cis.upenn.edu (Ellen F. Prince) Date: Mon, 30 Aug 2004 12:04:14 EDT Subject: The Chinese Diplomat's "the" In-Reply-To: Your message of "Mon, 30 Aug 2004 08:22:27 PDT." <66D6AE42-FA98-11D8-BA73-000D932A40AE@mail.sdsu.edu> Message-ID: R. Malouf writes: >Hi, > >On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: >> In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: >> << At any rate, the performance of the best models is getting close to >> that >> of humans at guessing which article will be used in a given context. >> >> >> There's an irony to why one sees such adherence to structuralist >> criteria on >> the "functional" linguistics list. In most situations, of course, a >> computer >> model cannot possibly predict the use of "the" versus "a" unless it >> also reads >> minds. > >It's hard for me to imagine anything less "structuralist" than an >instance-based model like this one. The system produces an article for >a sequence like "please get ___ car" by searching a reference corpus >for similar patterns. If it finds sequences like "please get the car" >more often than "please get a car" or "please get car", it produces a >"the". > >The amazing thing is that this actually works! If we take a corpus, >strip out all the articles, and use the system to try to recover them, >it's right almost 85% of the time. This can be further improved >somewhat by providing the system with an ontology of noun meanings (so >it can draw generalizations about words which don't occur in the >reference corpus but have very similar meanings to words which do). >No, it's never going to be right 100% of the time, at least until we >can read minds, but in most situations, very simple information about >the context is all that's needed. This may be an attractive solution for producing software for the market -- but it is simply hilarious as any sort of model of how humans use language. Imagine two company robots flying to a remote destination together. One has the kind of software you are describing; the other has human-like competence in the use of articles. After collecting their baggage, the one with your (kind of) software says to the other one, 'I've just realized that we need the car, please.' Being an obedient robot and understanding the request as a human would, the requestee boards the next flight back home, since the only thing s/he/it can infer from _the car_ in this context is their company car... The fact that people typically drive their own car, which is Hearer-known or Inferrable and hence typically definite, more often than a rental car, which can be Hearer-new and hence typically indefinite, is profoundly irrelevant to human language processing/competence -- even if it'll get the software developer safely thru a demo (almost) 85 out of 100 times... And, by the way, to deal with linguistic reference, we only have to 'read minds' as well as the average speaker does -- i.e. not at all. What we need is a large and relevant knowledge-base and a system of plausible reasoning, both needed anyway for other aspects of AI, as well as some form-function correspondences for each language. IOW, we need what languages users have. Ellen Prince From ellen at central.cis.upenn.edu Mon Aug 30 16:13:46 2004 From: ellen at central.cis.upenn.edu (Ellen F. Prince) Date: Mon, 30 Aug 2004 12:13:46 EDT Subject: "the" (2) In-Reply-To: Your message of "Mon, 30 Aug 2004 11:37:45 EDT." <158.3dcf1d6e.2e64a3c9@aol.com> Message-ID: _Christ_ may be considered a title but it's ultimately a common noun (or adjective used as such) meaning 'anointed (one)'. I would imagine that it's that sense that's being emphasized when the article is used, as in Mel Gibson's movie title. Ellen Prince From hstahlke at bsu.edu Mon Aug 30 16:18:11 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 11:18:11 -0500 Subject: "the" (2) Message-ID: Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. Herb -----Original Message----- From: funknet-bounces at mailman.rice.edu [mailto:funknet-bounces at mailman.rice.edu] On Behalf Of Ellen F. Prince Sent: Monday, August 30, 2004 11:14 AM To: funknet at mailman.rice.edu Subject: Re: [FUNKNET] "the" (2) _Christ_ may be considered a title but it's ultimately a common noun (or adjective used as such) meaning 'anointed (one)'. I would imagine that it's that sense that's being emphasized when the article is used, as in Mel Gibson's movie title. Ellen Prince From Salinas17 at aol.com Mon Aug 30 16:41:09 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 12:41:09 EDT Subject: "the" (2) Message-ID: In a message dated 8/30/04 12:18:04 PM, hstahlke at bsu.edu writes: << Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. >> Herb- Was chrio:/christos ever used in Greek before Christianity in the sense of "anoint in consecration" or did it ever appear as a title (proper noun)? In other words, did the translation also carry a new meaning into Greek? I think earlier Hebrew kings were also called "the anointed ones" in Hebrew. Steve From rmalouf at mail.sdsu.edu Mon Aug 30 17:08:36 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 10:08:36 -0700 Subject: The Chinese Diplomat's "the" (2) In-Reply-To: Message-ID: On Mon, 2004-08-30 at 09:01, Salinas17 at aol.com wrote: > In a message dated 8/30/04 11:22:51 AM, rmalouf at mail.sdsu.edu writes: > << So, yeah, if he'd ever wanted to tell a valet to "please get a car", the > system would have inserted an unwanted "the". Fortunately, hardly anyone ever > does that, > so the problem doesn't come up very often. >> > > "...get a car." It is what I say all the time in reference to rental cars at > the airport. And guys like Tony Soprano might say it with regard to the cars > they want gotten. You're working with a limited context. In any case, the > actual odds are extra-linguistic. Why draw a distinction between linguistic and extra-linguistic factors? I thought we were functionalists here! :-) As I said, it's easy to construct examples which confound a system like this. The striking thing is that such examples are fairly rare in actual language use. A very simple program is able to guess the right article for 85% of the common nouns from a sample of the Wall Street Journal. Of the remaining 15%, some of the articles generated by the system would work just as well as the original one in the text, so the actual rate of "wrong" predictions is somewhat less than 15%. And, of the remaining errors, many would be resolved correctly if we just had a larger reference corpus. As a linguist, I think the fact that such an obviously inadequate system performs as well as it does is interesting. Not because it gives us a plausible model of human language processing, but because it gives an empirical measure of just how rare the truly hard cases are. > < instance-based model like this one. The system produces an article for a sequence like > "please get ___ car" by searching a reference corpus for similar patterns.>> > > It is completely structural in how it gets to output. How so? There's no grammar or grammaticality, no rules or categories, no notion of contrastive or complementary distribution. There is a gradient measure of sequence similarity, which I guess is a bit like the structuralist idea of an opposition, but it's not one I would expect Saussure or Bloomfield to endorse. True, the task the system was evaluated on is structuralistish, but that's only because it's easy to measure the results of, and since it's at least as hard as the task we really care about (finding an article which does the right thing in a given context), it gives us an upper bound on the error rate. [Actually, to be honest, if you read the fine print, some notion of category does get smuggled in by the back door in this particular system, but that's not a necessary feature of a memory-based model.] > But the fact that you've found > predictability in the patterns of speech doesn't necessarily provide an > explanation of those patterns -- other than perhaps we are in the habit of talking > about the same things for the same reasons in the same ways from day to day. What more explanation do you need? ;-) -- Rob Malouf Department of Linguistics and Oriental Languages San Diego State University From hstahlke at bsu.edu Mon Aug 30 17:32:59 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 12:32:59 -0500 Subject: "the" (2) Message-ID: Steve, The Septuagint (3rd - 2nd c. BCE) uses christou in ISam12:3 to mean "anointed one". The form shows up regularly in Samuel/Kings. In ISam24:6 David refers to Saul as "the lord's anointed", using to: christo: kyriou. But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. Herb -----Original Message----- From: Salinas17 at aol.com [mailto:Salinas17 at aol.com] Sent: Monday, August 30, 2004 11:41 AM To: FUNKNET at LISTSERV.RICE.EDU Subject: Re: [FUNKNET] "the" (2) In a message dated 8/30/04 12:18:04 PM, hstahlke at bsu.edu writes: << Correct, as a Greek translation of Aramaic meshiha, Hebrew mashah. >> Herb- Was chrio:/christos ever used in Greek before Christianity in the sense of "anoint in consecration" or did it ever appear as a title (proper noun)? In other words, did the translation also carry a new meaning into Greek? I think earlier Hebrew kings were also called "the anointed ones" in Hebrew. Steve From language at sprynet.com Mon Aug 30 18:02:31 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 14:02:31 -0400 Subject: Fw: [FUNKNET] extension of "the" Message-ID: ----- Original Message ----- From: "Alexander Gross" To: "Rob Malouf" Sent: Sunday, August 29, 2004 9:42 PM Subject: Re: [FUNKNET] extension of "the" > Thanks, Rob, i've been reading similar literature for the past 25 years & > first discussed this problem in the early 'sixties with my brother-in-law > Morton Astrahan, the IBM VP then in charge of preparing their MT project. > He was pretty sure they'd have most of the bugs ironed out in time for their > demonstration at the NY World's Fair of 1964. > > A lot of this depends on who is doing the reporting. You might want to look > at the following on-line report in the current Translation Journal: > > Machine Translation and Computer-Assisted Translation: a New Way of > Translating? > by Olivia Craciunescu, Constanza Gerding-Salas, and Susan Stringer-O'Keeffe, > > it's at: > > http://www.accurapid.com/journal/29computers.htm > > very best! > > alex > > > ----- Original Message ----- > From: "Rob Malouf" > To: "Alexander Gross" > Cc: > Sent: Sunday, August 29, 2004 6:21 PM > Subject: Re: [FUNKNET] extension of "the" > > > > > > On Aug 29, 2004, at 1:24 PM, Alexander Gross wrote: > > > Since you (singular and plural) imagine that it will one day be > > > possible to construct an "adequate" machine translation system, > > > here is *your* little assignment. It's easy, it's all in English. I > > > want you to come up with the precise, practical rules by which > > > we decide to put "the" in front of a noun as opposed to when > > > we decide to put "a" or "an" in front of a noun as opposed to > > > when we decide to put absolutely nothing ("zero-grade article") > > > in front of a noun. Also: precisely when do we have a choice > > > between two possible methods? > > > > While no one (that I know of) has written such rules, there's been > > considerable work on addressing this problem using machine learning and > > statistical models. For example, this paper reports some early > > experiments: > > > > http://citeseer.ist.psu.edu/minnen00memorybased.html > > > > I know they've improved on these results since then, but I can't find > > the reference off hand. At any rate, the performance of the best > > models is getting close to that of humans at guessing which article > > will be used in a given context. > > --- > > Rob Malouf > > rmalouf at mail.sdsu.edu > > Department of Linguistics and Oriental Languages > > San Diego State University > > > > > From language at sprynet.com Mon Aug 30 18:31:07 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 14:31:07 -0400 Subject: The Chinese Diplomat's "the" Message-ID: > The amazing thing is that this actually works! If we take a corpus, > strip out all the articles, and use the system to try to recover them, > it's right almost 85% of the time. I'm disappointed to see that claims like "it's right almost 85% of the time" are still being advanced by MT advocates. Here's what I had to say about this twelve years ago in my Limitations of Computers as Translation Tools (in Computers in Translation: A Practical Approach, Routledge, 1992): --------------------------------------------- Also often encountered in the literature are percentage claims purportedly grading the efficiency of computer translation systems. Thus, one language pair may be described as `90% accurate' or `95% accurate' or occasionally only `80% accurate.' The highest claim I have seen so far is `98% accurate.' Such ratings may have more to do with what one author has termed spreading `innumeracy' than with any meaningful standards of measurement. On a shallow level of criticism, even if we accepted a claim of 98% accuracy at face value (and even if it could be substantiated), this would still mean that every standard double-spaced typed page would contain five errors--potentially deep substantive errors, since computers, barring a glitch, never make simple mistakes in spelling or punctuation. It is for the reader to decide whether such an error level is tolerable in texts that may shape the cars we drive, the medicines and chemicals we take and use, the peace treaties that bind our nations. As for 95% accuracy, this would mean one error on every other line of a typical page, while with 90% accuracy we are down to one error in every line. Translators who have had to post-edit such texts tend to agree that with percentage claims of 90% or less it is easiest to have a human translator start all over again from the original text. On a deeper level, claims of 98% accuracy may be even more misleading--does such a claim in fact mean that the computer has mastered 98% of perfectly written English or rather 98% of minimally acceptable English? Is it possible that 98% of the latter could turn out to be 49% of the former? There is a great difference between the two, and so far these questions have not been addressed. ----------------------------------------------------- (Full text of this piece available on my website under the Linguistics/MT menu at:) http://language.home.sprynet.com very best to all! alex ----- Original Message ----- From: "Rob Malouf" To: Cc: Sent: Monday, August 30, 2004 11:22 AM Subject: [FUNKNET] Re: The Chinese Diplomat's "the" > Hi, > > On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote: > > In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes: > > << At any rate, the performance of the best models is getting close to > > that > > of humans at guessing which article will be used in a given context. >> > > > > There's an irony to why one sees such adherence to structuralist > > criteria on > > the "functional" linguistics list. In most situations, of course, a > > computer > > model cannot possibly predict the use of "the" versus "a" unless it > > also reads > > minds. > > It's hard for me to imagine anything less "structuralist" than an > instance-based model like this one. The system produces an article for > a sequence like "please get ___ car" by searching a reference corpus > for similar patterns. If it finds sequences like "please get the car" > more often than "please get a car" or "please get car", it produces a > "the". > > The amazing thing is that this actually works! If we take a corpus, > strip out all the articles, and use the system to try to recover them, > it's right almost 85% of the time. This can be further improved > somewhat by providing the system with an ontology of noun meanings (so > it can draw generalizations about words which don't occur in the > reference corpus but have very similar meanings to words which do). > No, it's never going to be right 100% of the time, at least until we > can read minds, but in most situations, very simple information about > the context is all that's needed. > > A system like this has obvious applications for machine translation, > but the reason we first got to thinking about this problem was in the > context of an adaptive communication system. We were working with an > ALS patient who was completely paralyzed: he couldn't speak, move, or > even breathe on his own, but by moving his eyes he could spell out > simple messages. This was very fatiguing for him, and the messages > tended to be highly telegraphic: "please get the car" might well come > out as "ge cr". His family could understand what he meant, but no one > else could. This program for generating articles was part of a larger > system to "translate" things like "ge cr" into fluent, polite English: > "please get the car". You might think that this could only be done > reliably with full mind reading ability and/or a vast store of general > world knowledge, and it's easy to make up isolated examples where > that's true. But, it turns out that in real life it can be done > remarkably well using very simple tricks. So, yeah, if he'd ever > wanted to tell a valet to "please get a car", the system would have > inserted an unwanted "the". Fortunately, hardly anyone ever does that, > so the problem doesn't come up very often. > --- > Rob Malouf > rmalouf at mail.sdsu.edu > Department of Linguistics and Oriental Languages > San Diego State University > > From jrubba at calpoly.edu Mon Aug 30 19:22:38 2004 From: jrubba at calpoly.edu (Johanna Rubba) Date: Mon, 30 Aug 2004 12:22:38 -0700 Subject: "the" (2) Message-ID: I don't see "the 405" as placement of an article before a proper name. I do believe it is a short form of "the 405 freeway." If you've listened to enough LA radio traffic reports, you hear alternation between the shorter and longer usage. And perhaps people more expert on SoCal usage can chime in as to whether So. Californians use "the" in front of other proper names. I don't have any awareness of such. I do not hear the usages Steve Long reports, e.g. "the Santa Claus" or "the Popeye". As to "the Christ", I'm sure Gibson was using it in the traditional theological sense, this being a very fundamentalist Catholic movie. But somehow I doubt that this film is responsible for the spread of such usages. It's too recent. "The Donald" has been in common use since long before Gibson's film appeared. My intuition tells me that "the" is inserted in such cases as a campy acknowledgment of his (supposed?) uniqueness and fame, as we say "the sun" and "the moon", because we can be sure everyone knows which sun or moon (or Donald) we are talking about. Re British "the Burger King", this has a familiar ring to me. But my memories of British English are too foggy to verify or come up with other examples. Surely there are some Brits out there who subscribe to Funknet ... ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Johanna Rubba Associate Professor, Linguistics English Department, California Polytechnic State University One Grand Avenue ? San Luis Obispo, CA 93407 Tel. (805)-756-2184 ? Fax: (805)-756-6374 ? Dept. Phone. 756-2596 ? E-mail: jrubba at calpoly.edu ? Home page: http://www.cla.calpoly.edu/~jrubba ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From Salinas17 at aol.com Mon Aug 30 20:06:43 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 16:06:43 EDT Subject: "the" (3) Message-ID: In a message dated 8/30/04 3:23:41 PM, jrubba at calpoly.edu writes: << I don't see "the 405" as placement of an article before a proper name. I do believe it is a short form of "the 405 freeway." If you've listened to enough LA radio traffic reports, you hear alternation between the shorter and longer usage. >> "the 405 freeway" is for most purposes already is a proper name. Like the Hudson (always meaning river not the explorer or the Bay.) There are no other members of the category. It already has separate status from other freeways. The license to shorten it doesn't change that. Chesterton said something about how it was never polite to say "the Queen of England" because it had to be assumed everyone knew what queen you were talking about when you said "the Queen". <> You don't watch enough television. Letterman uses the form a lot. How about "The Shaq?" I heard it during the playoffs. Did you watch the playoffs? Steve Long From david.kronenfeld at ucr.edu Mon Aug 30 20:14:02 2004 From: david.kronenfeld at ucr.edu (David B. Kronenfeld) Date: Mon, 30 Aug 2004 13:14:02 -0700 Subject: "the" (2) Message-ID: Mostly I agree with you. But we do hear or see occasional usage of expressions like "the Donald" or "the Arnold". When used they seem to be a way of being a little cute--and of implying that the person in question has become something of either a caricature or a trademark. And, for my examples, "the Arnold" kind of trails after "the terminator"--but as a way of cutting him down a little, while "the Donald" sort of cuts our supreme trumpeter down a bit while also making clear that we are talking about a business trademark (not just any old "Donald", but "the Donald"). Language remains a moving target and we continue to do funny things with it. Cheers, David At 12:22 PM 8/30/2004, Johanna Rubba wrote: >I don't see "the 405" as placement of an article before a proper name. I >do believe it is a short form of "the 405 freeway." If you've listened to >enough LA radio traffic reports, you hear alternation between the shorter >and longer usage. And perhaps people more expert on SoCal usage can chime >in as to whether So. Californians use "the" in front of other proper >names. I don't have any awareness of such. I do not hear the usages Steve >Long reports, e.g. "the Santa Claus" or "the Popeye". > >As to "the Christ", I'm sure Gibson was using it in the traditional >theological sense, this being a very fundamentalist Catholic movie. But >somehow I doubt that this film is responsible for the spread of such >usages. It's too recent. "The Donald" has been in common use since long >before Gibson's film appeared. My intuition tells me that "the" is >inserted in such cases as a campy acknowledgment of his (supposed?) >uniqueness and fame, as we say "the sun" and "the moon", because we can be >sure everyone knows which sun or moon (or Donald) we are talking about. > >Re British "the Burger King", this has a familiar ring to me. But my >memories of British English are too foggy to verify or come up with other >examples. Surely there are some Brits out there who subscribe to Funknet ... ? > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Johanna Rubba Associate Professor, Linguistics >English Department, California Polytechnic State University >One Grand Avenue ? San Luis Obispo, CA 93407 >Tel. (805)-756-2184 ? Fax: (805)-756-6374 ? Dept. Phone. 756-2596 >? E-mail: jrubba at calpoly.edu ? Home page: http://www.cla.calpoly.edu/~jrubba >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David B. Kronenfeld Phone Office 951/827-4340 Department of Anthropology Message 951/827-5524 University of California Fax 951/951-5409 Riverside, CA 92521 email david.kronenfeld at ucr.edu Department: http://Anthropology.ucr.edu/ Personal: http://pages.sbcglobal.net/david-judy/david.html Society for Anthropological Sciences: http://anthrosciences.org/index. From Salinas17 at aol.com Mon Aug 30 20:27:35 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Mon, 30 Aug 2004 16:27:35 EDT Subject: "the" (3) Message-ID: In a message dated 8/30/04 1:32:51 PM, hstahlke at bsu.edu writes: << But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. >> In Lidell-Scott, the first Greek references to "anoint as a consecration" are Christian. And I don't see it as a epithet in Greek before Christ. Before that it's mainly about smearing oil on the body or white-washing a house or stucco -- nothing particularly religious. The meaning of "anointing" in Greek seems pretty concrete and mundane at an earlier time. So here it seems is a Greek word that changed drastically in its main meaning when it was used to translate a foreign word. A small lesson perhaps in how new ideas travel as a change in words. Steve From language at sprynet.com Mon Aug 30 20:35:59 2004 From: language at sprynet.com (Alexander Gross) Date: Mon, 30 Aug 2004 16:35:59 -0400 Subject: extension of "the" Message-ID: Thanks, Wendy. It will be two weeks before i am back in NYC & can look up Peter Master's contributions on this subject. But so far as i can see from summaries on the web, i don't think he and i would have too many differences here. That's because he is concerned with practical solutions for helping ESL students to learn and not merely airy universalist linguistic cause-mongering or promotions for MT, just as i am concerned with training translators & breaking through to describing how language actually works. So far as i can tell, he maintains that grammar can be usefully taught in ESL courses and breaks down his method for treating articles into beginning, intermediate, and advanced phases (none of which is going to help MT programmers very much). OTOH i believe there is a dispute within the ESL community between those who emphasize teaching grammar & those who just want to start their students talking some form of English. It's probable that most of us who learn foreign languages as adults will never speak them perfectly, whichever course is chosen. My Spanish was good enough 50 years ago to get me a job as a bilingual radio announcer in Madrid for Radio Nacional de Espan~a. And I've boasted that I speak it fluently ever since, which in some ways I do. And I'm also fairly justified in my claim that I can speak, read, translate from, interpret brief dialogs into, and even write (with some help from a native editor) five or six languages (six including both British and American English :-) ). But I'm just now going through the hard slogging of preparing for a conference in Xalapa, Mex., and I'm becoming painfully aware of how "broken" my Spanish really is. But at least I'm aware of it, which means that I can improve it a bit. Contrary to Steve's fantasies that all language can be broken down to Roger Schank-like scenarios involving dialogues with car valets, both grammar and accent really do matter in most languages. very best to all! alex ----- Original Message ----- From: WENDY SMITH To: Alexander Gross Cc: clements ; funknet at mailman.rice.edu ; rronques at indiana.edu Sent: Sunday, August 29, 2004 4:40 PM Subject: Re: [FUNKNET] extension of "the" See work by Peter Master. He did his dissertation (UCLA) on this topic. ----- Original Message ----- From: Alexander Gross Date: Sunday, August 29, 2004 1:24 pm Subject: Re: [FUNKNET] extension of "the" > > > > > Does anyone know of any studies on the extension of the use of > "the".> In her home town (Stafford VA), a student of mine noted > that "the" > > can be used: > > I find it fascinating that anyone would assume that "the" might > have a > "normal" use which could then be subject to extension. And that > there would > be any studies which could conceivably place its usage within any > sort of > normative range at all or explore the possible range of extensions. > > I wonder if this may be just one further offshoot from the > illusion shared > by many linguists that the guiding principles of language have been > discovered, described, and even codified. Or in Steven Pinker's words, > linguists have found "the single mental design underlying" all > languages and > "we all have the same minds." > > Four years ago I issued a challenge not only to all those on the > sci.langUSENET newsgroup concerning this matter, it was in fact a > repeat of > the very challenge I had also issued a few years earlier to one of the > foremost founders of AI, a master mathematician and a name so > eminent as to > require no further airing here (though the curious may discover it by > running a Deja search on the sci.lang archives). > > Neither this expert nor the linguists on sci.lang were able to > come up with > a response to this challenge. I am now readdressing it to my > colleagues on > FUNKNET to discover if they will fare any better with it. > > The challenge went as follows: > > --------------------------------------------- > > Since you (singular and plural) imagine that it will one day be > possible to construct an "adequate" machine translation system, > here is *your* little assignment. It's easy, it's all in English. > I > want you to come up with the precise, practical rules by which > we decide to put "the" in front of a noun as opposed to when > we decide to put "a" or "an" in front of a noun as opposed to > when we decide to put absolutely nothing ("zero-grade article") > in front of a noun. Also: precisely when do we have a choice > between two possible methods? > > Further requirement: the rule or rules you come up with have to work > for ALL instances of putting articles in front of nouns. The rules > should be so fool-proof and logically transparent that we can even > make an expert system paradigm out of them, so that anyone who > needed to know which rule to apply could simply consult the expert > system and find the right answer. You'll need something like this > for that "adequate" MT system--it will be crucial to spell out these > rules for English, especially since some fairly different ones apply > to almost any foreign language you can name. And even languages > without articles as such, like Russian and Chinese, have a few > quirks in this regard, to say nothing of the problems of translating > all these languages into and out of English. Today's most advanced > MT systems get all this wrong as often as right. > > But there's another and even better reason for coming up with a > solution. I've tried this task more than once, so it's more than > an idle > riddle. I was first asked to come up with a solution by a > Chinese senior revisor & computer linguist friend at the UN > translationdepartment who himself had trouble deciding which > article to use. > I was eager to solve it for him, and I was almost certain I could come > up with the solution quite easily. I was also interested because some > of my students in a translator-training course I was then teaching > alsoasked me for the same solution. > > They really needed the answer, because they continually made > mistakes with articles both in their writing and speech, which > made it sound as though all they could manage was "broken English." > And this is what many people think when foreigners get their articles > wrong, either in speech or in translations. But these were perfectly > literate & intelligent people--they just couldn't figure out the rules > for English articles. > > The point here is not merely to come up with the usual explanation > for this problem (which amounts to little more than saying "when > something is definite, it takes the definite article, when something > is indefinite, it...). The point IS to come up with a clear set of > rules that can help foreigners to learn English. And beyond that > can incidentally also serve as the basis for an "adequate" MT program. > > Perhaps you also will make the mistake of supposing--as I did--that > this is a trivial problem. Believe me--it isn't. I had no trouble > coming up with the first two or three rules, but there were still > many inexplicable instances, where I had to say lamely to my > students "Learn the Language." I ended up weaseling out by telling > both my students and my friend at the UN to read the NY Times & > other sources & try to figure out for themselves why "a" or "the" > or neither one is used. As Martin Kay has pointed out, you can > throw all the computing power in the world at MT and still come > up empty. At what point does a trivial problem become an > intractable one? > > -------------------------------- > > Let me reiterate that while this may look like a simple problem, > it isn't. > Using an If, Then, Else logical framework, I tried to build > something like > an expert system that could represent its terms but couldn't truly get > beyond the first few rules. The permissible range for using our > articlesvaries not merely between British and American English but > within our own US > variety according to differences of region, class, education, national > origin, and age. It may even vary between members of the same > family and > over time within the usage of a single individual. > > And we're talking just about English here--imagine the > complexities that > arise when other languages are brought in. And since this is true > for such > an extremely small subset of structural linguistic problems in a > singlelanguage, how much more true must it be for the august, all- > embracing,universalist theory advanced by MIT linguists? To say > nothing of all its > cognitive this and that spinoffs? A French friend tells me the > manual for > French-English conversion of articles looks like a small law book, > whicheven then is sure to have exceptions and omissions. If after > decades of > detailed rule-seeking and measurements and busy work on the "syntactic > structures" of minute language byways our current school of > linguists can't > solve this problem, then what can they solve? > > very best to all! > > alex > > > ----- Original Message ----- > From: "clements" > To: > Cc: > Sent: Friday, August 27, 2004 11:47 AM > Subject: [FUNKNET] extension of "the" > > > > Dear Funknetters, > > Does anyone know of any studies on the extension of the use of > "the". In > > her home town (Stafford VA), a student of mine noted that "the" > can be > > used: > > > > --With most acronyms > > I have the AOL. > > She has the SARS. > > > > --With generics > > I like the coffee/the candy. (to refer to all coffee or candy) > > > > --With many proper place names. These tend to be specific > references,> especially the store names. If my friend told me she > was going to "the > > Pier 1," I would understand that she meant the Pier 1 in Central > Park.> We are going to the Nashville. > > I'm in the Target. > > He bought it at the Pier 1. > > > > I have heard it reported with abstract nouns, as in > > > > I have the diabetes > > > > and a colleague of mine in Fort Wayne IN reported hearing it > from his > > students. > > > > Any leads would be most welcome. If there's interest, I'll > write up a > > summary. > > > > Clancy Clements > > > > > > > > > > > From rmalouf at mail.sdsu.edu Mon Aug 30 22:37:20 2004 From: rmalouf at mail.sdsu.edu (Rob Malouf) Date: Mon, 30 Aug 2004 15:37:20 -0700 Subject: The Chinese Diplomat's "the" In-Reply-To: <013001c48ebf$850a02a0$1b999c04@user1sznx2zyoc> Message-ID: On Mon, 2004-08-30 at 11:31, Alexander Gross wrote: > > The amazing thing is that this actually works! If we take a corpus, > > strip out all the articles, and use the system to try to recover them, > > it's right almost 85% of the time. > > I'm disappointed to see that claims like "it's right almost 85% of the time" > are still being advanced by MT advocates. I'm no MT advocate -- my personal feeling is that MT is impossible, but there are enough people smarter than me who disagree that I hesitate to say that in public. The original motive for the paper that I cited was an adaptive communication device. It had nothing to do with MT. And, in case I didn't make it clear, the "right almost 85% of the time" was for a narrowly defined task, namely recovering omitted articles in monolingual English texts. For that task, according to the results they published, it really is right almost 85% of the time. Unless you are accusing the authors of fraud, I don't see there is any evidence of "innumeracy" here, spreading or otherwise. Absolutely no claims are being made about MT, or how well this program would perform as a component of an MT system, or really even whether a program like this is useful for anything. However, I am making the claim based on this paper (though the authors might not endorse it) that most of the time selecting which article to use in a given context isn't very hard. -- Rob Malouf Department of Linguistics and Oriental Languages San Diego State University From hdls at unm.edu Mon Aug 30 23:01:14 2004 From: hdls at unm.edu (High Desert Linguistics Society) Date: Mon, 30 Aug 2004 17:01:14 -0600 Subject: Final Call for HDLS-6 Conference (Nov. 4-6, 2004) Message-ID: The Sixth High Desert International Linguistics Conference will be held at the University of New Mexico, Albuquerque, NM, November 4 -6, 2004. The invited keynote speakers are Joan Bybee (University of New Mexico), David McNeill (University of Chicago), and Suzanne Kemmer (Rice University). We invite you to submit proposals for 20-minute talks with 10-minute discussion sessions in any area of linguistics - especially those from a cognitive / functional linguistics perspective Papers in the following areas are particularly welcome: Evolution of language, Grammaticization, Metaphor & Metonymy, Language change & variation, Sociolinguistics, Bilingualism, Signed languages, Gesture, Native American languages, Language acquisition and Computational Linguistics. The deadline for submitting abstracts is September 3rd, 2004. Abstracts should be sent via email, as an attachment, to hdls at unm.edu. Please include the title "HDLS-6 abstract "in the subject line. MS-Word format is preferred or RTF if necessary. The e-mail and attached abstract must include the following: 1. Author's Name(s) 2. Author's Affiliation(s) 3. Title of the Paper 4. E-mail address of the primary author The abstract should be no more than one page and no less than 11-point font. A second page is permitted for references and/or data. Only two submissions per author will be accepted and we will only consider submissions that conform to the above guidelines. Notification of acceptance will be sent out by the evening of September 5th, 2004 If you have any questions or need for further information please contact us at hdls at unm.edu with "HDLS-6 Conference" in the subject line. From hstahlke at bsu.edu Tue Aug 31 02:05:21 2004 From: hstahlke at bsu.edu (Stahlke, Herbert F.W.) Date: Mon, 30 Aug 2004 21:05:21 -0500 Subject: "the" (3) Message-ID: I agree overall with your analysis. However, I've checked Thayer's Greek-English Lexicon of the New Testament, which also includes Septuagint and Hellenistic sources. ho christos shows up all over the S in its "anoint as consecration meaning". Ps.114:15, Ps.2:2, Hab.3:13, all over Samuel/Kings/Chronicles, and even in reference to a foreign king, Cyrus, in Is.45:1. The word did not have this meaning in pre-Christian non-Jewish writing, but pre-Christian Hellenistic Judaism did extend the secular meaning to its sacred needs, antedating and perhaps establishing NT usage a couple of centuries earlier. Herb Subject: Re: [FUNKNET] "the" (3) In a message dated 8/30/04 1:32:51 PM, hstahlke at bsu.edu writes: << But I don't have a Classical Greek concordance handy, so I don't know how it would have been used in that body of literature where a notion of messiah didn't exist. >> In Lidell-Scott, the first Greek references to "anoint as a consecration" are Christian. And I don't see it as a epithet in Greek before Christ. Before that it's mainly about smearing oil on the body or white-washing a house or stucco -- nothing particularly religious. The meaning of "anointing" in Greek seems pretty concrete and mundane at an earlier time. So here it seems is a Greek word that changed drastically in its main meaning when it was used to translate a foreign word. A small lesson perhaps in how new ideas travel as a change in words. Steve From language at sprynet.com Tue Aug 31 07:59:47 2004 From: language at sprynet.com (Alexander Gross) Date: Tue, 31 Aug 2004 03:59:47 -0400 Subject: The Chinese Diplomat's "the" Message-ID: Good, Rob, glad to hear you think it's impossible, though that's probably not the whole story either, and as the source i cited mentioned, all the work that has been done (& all the billions of $ spent so far) could end up helping translators to work more efficiently, though CAT & TM already do this. The real kicker is that even if they finally perfect MT, the only people who will be able to handle the system & make corrections will end up being human translators, or at least those human translators willing to work with it. No, i'm certainly not accusing the authors of fraud. But i do have to tell you that there have been some genuine instances of fraudulent demos in this field, documented back in the 80s in the pages of Language Technology (the precursor of WIRED Magazine), which i wrote for at the time. Also, on one occasion three of us, the UN's MT & Terminology expert, the president of the NY Circle of Translators, and myself, had no choice but to show up for a press conference promoting a blatantly fraudulent MT system. Fortunately the press managed to figure it out for themselves, and we didn't have to say very much. The person behind that system just might be one of those people you feel is smarter than you--or perhaps the teacher of some of them. What's more, all through the late 80s one MT company ran ads promising that with their system monolinguals will perform "truly automatic translation .....without assistance from bilinguals, polyglots or post-editors.....but meeting the quality standards of professional translators-no less." That guy is still quite active in the field but now promises no more than further improvements in TM (Translation Memory). > I'm no MT advocate -- my personal feeling is that MT is impossible, but > there are enough people smarter than me who disagree that I hesitate to > say that in public. The original motive for the paper that I cited was > an adaptive communication device. It had nothing to do with MT. And, > in case I didn't make it clear, the "right almost 85% of the time" was > for a narrowly defined task, namely recovering omitted articles in > monolingual English texts. For that task, according to the results they > published, it really is right almost 85% of the time. Unless you are > accusing the authors of fraud, I don't see there is any evidence of > "innumeracy" here, spreading or otherwise. > > Absolutely no claims are being made about MT, or how well this program > would perform as a component of an MT system, or really even whether a > program like this is useful for anything. However, I am making the > claim based on this paper (though the authors might not endorse it) that > most of the time selecting which article to use in a given context isn't > very hard. There i certainly agree with you. But remember, it isn't very hard for you & me, but it's bewilderingly difficult for many ESL & translator-training students. Anyway, 85% still isn't going to cut it, and i can't help wondering if they asked their system to choose only between definite and indefinite articles, in which case the law of averages would already credit both alternatives with 50%. Even if they allowed for zero grade articles, that would still give all three alternatives a 33% free boost before the test went further. very best! alex > -- > Rob Malouf > Department of Linguistics and Oriental Languages > San Diego State University > > > From Salinas17 at aol.com Tue Aug 31 14:40:55 2004 From: Salinas17 at aol.com (Salinas17 at aol.com) Date: Tue, 31 Aug 2004 10:40:55 EDT Subject: The Chinese Diplomat's "the" (3) Message-ID: In a message dated 8/30/04 4:37:17 PM, language at sprynet.com writes: << Contrary to Steve's fantasies that all language can be broken down to Roger Schank-like scenarios involving dialogues with car valets, both grammar and accent really do matter in most languages. >> Well, obviously a problem with my scenario would be that it gave Alex the impression that I was saying grammar and accent don't matter. (Reminded me of one of the more memorable Roger Schank lines: "People don't remember what you say. They remember what they say.") One of my points was that there are actually two different kinds of "bad grammar." There's one kind that makes my speech incomprehensible to listeners. There's another kind that sounds wrong "grammatically" but is nevertheless understandable by listeners. (Time for more scenarios.) A child recently told me that he "waked up in the morning..." I corrected him but understood what he was saying. That's bad grammar that doesn't directly interfere with communication, except to the extent that it distracts or affects the willingness of the listener to listen. However, the Chinese diplomat scenario appears to teach us that whether grammar is faulty can often depend on non-linguistic factors (i.e., whether the embassy owns many cars or just one car -- ie, "get a car" or "get the car"). Some sociolinguists have had a habit of calling these non-linguistic factors "context", in the sense of surrounding circumstances. But the fact is they are the core reason we are speaking in the first place. If our diplomat has no interest in cars, he should logically have nothing to say and the correct article and other grammar problems do not arise. What Rob originally wrote was: "At any rate, the performance of the best [computer] models is getting close to that of humans at guessing which article will be used in a given context." What I was challenging in that statement was how a computer could know "context" -- the non-linguistic ingredients in the soup. From what I can tell, the computer thinks "get the car" is more likely than "get a car" because "get the car" or something like it has been more likely in the past. This is not "context" in the sense of reference, which involves non-linguistic factors. It's "context" in the sense of word sequence and adjacency history and contraints on sentence structure. That's an important difference in terminology and one I thought worth mentioning. It seems to confuse the computer generated language issues a lot. Particularly because "a car" versus "the car" is NOT always a matter that can be solved without looking outside language and in the real world. The parking valet teaches us that. A machine cannot solve that problem on its own. It just doesn't know whether " a car" or "the car" is correct in that circumstance. It doesn't know whether the diplomat should choose one or the other. And of course we can't say which is correct unless we also have such knowledge. Alex also writes: <<... just as i am concerned with ...breaking through to describing how language actually works. >> Let me suggest a place to start. A friend recently received a phone message from a colleague with a strong Southern accent. She and I could make out at best five words out of two dozen. We're all competent native English speakers, but the message to us was incomprehensible. That's an example of when language "actually doesn't work" though it should. Let me suggest that explaining why it didn't work might go a long way towards explaining how it works, when it does work. BTW, there's a humorous piece on the web about "the THE" by Peter Master at: http://aaal.lang.uiuc.edu/letter/23.2/theology.html Regards, Steve Long