From pb at lpl.univ-aix.fr Sat Apr 8 09:53:57 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:53:57 +0200 Subject: Appel: TAPD-2000 Message-ID: From: "Miguel A. Alonso Pardo" ====================================================================== Final CALL FOR PAPERS ---------------------------------------------------------------------- TAPD 2000 2nd Workshop on 'Tabulation in Parsing and Deduction' ---------------------------------------------------------------------- September 19-21 2000 Vigo, Spain Sponsored by University of Vigo with the support of Caixa Vigo e Ourense Logic Programing Associates http://coleweb.dc.fi.udc.es/tapd2000/ Following TAPD'98 in Paris (France) next TAPD event will be held in Vigo (Spain) in September, 2000. The conference will be previous to SEPLN 2000 (http://coleweb.dc.fi.udc.es/sepln2000/), the conference of the Spanish Society for Natural Language Processing. MOTIVATIONS: Tabulation techniques are becoming a common way to deal with highly redundant computations occurring, for instance, in Natural Language Processing, Logic Programming, Deductive Databases, or Abstract Interpretation, and related to phenomena such as ambiguity, non-determinism or domain ordering. Different approaches, including for example Chart Parsing, Magic-Set rewriting, Memoization, and Dynamic Programming, have been proposed whose key idea is to keep traces of computations to achieve computation sharing and loop detection. Tabulation also offers more flexibility to investigate new parsing or proof strategies and to represent ambiguity by shared structures. The first objective of this workshop is to compare and discuss these different approaches. The second objective is to present tabulation and tabular systems to potential users in different application areas such as natural language processing, picture parsing, genome analysis, or complete deduction techniques. TOPICS (not exclusive): -- Tabulation Techniques: Chart Parsing, Tabling, Memoization, Dynamic Programming, Magic Set, Generic Fix-Point Algorithms -- Applications: Parsing, Generation, Logic Programming, Deductive Databases, Abstract Interpretation, Deduction in Knowledge Bases, Theorem Proving -- Static Analysis: Improving tabular evaluation -- Parsing or resolution strategies. -- Efficiency issues: Dealing with large tables (structure sharing, term indexing), Execution models, Exploiting the domain ordering (subsumption). -- Shared structures (parse or proof forest): Formal analysis, representation and processing. WORKSHOP FORMAT: The workshop will be a 3-day event that provides a forum for individual presentations of the accepted contributions as well as group discussions. INVITED SPEAKERS: Bharat Jayaraman -- Univ. of New York at Buffalo, US I.V. Ramakrishnan -- Univ. New York at Stony Brook, US SUBMISSION PROCEDURE: Authors are invited to submit before April 28 a 4-page position paper or abstract concerning a theoretical contribution or a system to be presented. Due to tight time constraints, submission and reviewing will be handled exclusively electronically (LaTeX, PostScript, dvi or ascii format). Submission should include the title, authors' names, affiliations, addresses, and e-mail. The submissions must be sent to David S. Warren (warren at cs.sunysb.edu) in gziped encoded postscript. SCHEDULE: Submission of contributions: April 28, 2000 Notification of acceptance: June 1, 2000 Final versions due: June 30, 2000 PROGRAM COMMITTEE CHAIR: David S. Warren -- Univ. New York at Stony Brook, US PROGRAM COMMITTEE: Francois Bry -- Univ. Munich, Germany Manuel Carro -- Univ. Polit. Madrid, Spain Eric de la Clergerie -- INRIA, France Veronica Dahl -- Univ. Simon Fraser, Canada Baudouin Le Charlier -- Univ. Namur, Belgium Mark Jan Nederhof -- DFKI, Germany Luis M. Pereira -- Univ. Nova de Lisboa, Portugal Martin Rajman -- EPFL, Switzerland Domenico Sacca -- Univ. della Calabria, Italy Kostis Sagonas -- Univ. Uppsala, Sweden David Shasha -- Univ. New York, US Terrance Swift -- Univ. New York at Stony Brook, US Manuel Vilares -- Univ. Vigo, Spain David Weir -- Univ. Sussex, UK ORGANIZING COMMITTEE CHAIR: Manuel Vilares -- Univ. Vigo, Spain ORGANIZING COMMITTEE: Miguel A. Alonso -- Univ. Coruna, Spain Eric de la Clergerie -- INRIA, France David Cabrero -- Univ. Vigo, Spain Victor M. Darriba -- Univ. Coruna, Spain David Olivieri -- Univ. Vigo, Spain Francisco J. Ribadas -- Univ. Coruna, Spain Leandro Rodriguez -- Univ. Vigo, Spain PUBLICATION: Papers accepted by the Program Committee must be presented at the conference and will appear in a proceedings volume. The format for camera-ready manuscripts will be available from the web page of the event. LOCATION: Auditorio del Centro Cultural Caixavigo e Ourense Marques de Valladares Vigo, Spain FURTHER INFORMATION: For further details consult http://coleweb.dc.fi.udc.es/tapd2000/, or contact TAPD'2000 Secretariat Escuela Superior de Ingeniería Informática Campus as Lagoas, s/n 32004 Ourense Spain E-mail: tapd-secret at ei.uvigo.es Fax: +34 988 387001 ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:54:33 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:54:33 +0200 Subject: Job: 1 Offer Message-ID: From: "lilian.blochet" Merci de diffuser cette offre d'emploi: Technologies GID , société éditrice de Spirit ,moteur de recherche en langage naturel Internet/Intranet, recrute Un Chef de Projet Linguistique Au sein du département Recherche et Développement , cette personne sera chargée de: -Encadrer une équipe de linguistes participant à l'enrichissement des ressources linguistiques des langues Françaises et Anglaises -Suivre les relations avec les partenaires étrangers pour les langues Espagnole,Portugaise,Néerlandaise,Allemande -La réalisation des outils nécessaires à la gestion des ressources -Participer aux spécifications , prototypage et tests des nouvelles versions de l'analyseur morpho-syntaxique -La veille technologique Cette personne de formation supérieure en Linguistique Informatique ou traitement automatique du langage avec quelques années d'expérience professionnelle , doit être autonome dans l'environnement Unix et la programmation en Perl/Awk. Langue maternelle Française ou Anglaise Rémunération suivant profil et expérience Envoyer votre candidature par courrier électronique à mailto:lilian.blochet at technologies-gid.com Technologies GID 84/88 Bld de la Mission Marchand 92411 Courbevoie Cedex ------------------------------------------------------- Lilian Blochet Technologies-GID Directeur Departement Recherche et Développements mailto:lilian.blochet at technologies-gid.com * 01 49 04 70 70 ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:55:07 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:55:07 +0200 Subject: Appel: Workshop on HCC Message-ID: From: Yorick Wilks THIRD ANNOUNCEMENT AND FINAL CALL FOR ABSTRACTS Apologies if you receive this from more than one source THIRD WORKSHOP ON HUMAN-COMPUTER CONVERSATION Grand Hotel Villa Serbelloni, Bellagio, Italy 3-5 July, 2000 Everything is on the website, including registration information on line, hotels (from simple to sumptious), the glorious site etc. The key date is 8 April when abstracts are due and that is only a few days away. Hotel accomodation should be booked as soon as possible. www.dcs.shef.ac.uk/research/units/ilash/Meetings/bellagio/ Invited speakers include (not all have yet accepted): Dr B. Alabiso, Microsoft, USA Dr J. Hutchens, UWA, Australia Prof. G Leech, University of Lancaster, UK Dr U. Reithinger, DFKI-Saarbruecken, DE Dr. T. Strzalkowski, General Electric, USA Prof. D. Traum, U Maryland, USA The Workshops on Human-Computer Conversation in Bellagio, Italy, took place in 1997 and 1998, as small groups of experts from industry and academia met to discuss this pressing question for the future of Language Engineering, not as an academic question only, but chiefly to bring forward for discussion computer demonstrations and activities within company laboratories that were not being published or discussed. The Workshops were highly successful in these aims and we now wish to widen participation and add distinguished speakers, as well as introducing more theoretical topics, though without losing the practical emphasis. The site remains one of the finest in the world, and it promoted excellent and intimate discussions in 1997 and 1998. The emphasis this year will take note of the CE Fifth Framework calls announced under Human Language Technology and in particular the emphasis on interactivity. We also plan to emphasise (in invited talks) the issue of politeness and whether it is crucial or dispensible to conversation, as well as recent results on statistical/empirical work on dialogue corpora, and on deriving marked up dialogue corpora. All details, including previous programs, program committee, accomodation and travel and details of registration are on the web site. Contributions are invited on any aspect of human-computer conversation, as are demonstrations. Two page abstracts should be sent by mail or email to the address at the bottom according to the following timetable: Deadline for submission: 8 April 2000 Notice of acceptance: 8 May 2000 Camera ready paper due: 8 June 2000 The European Association for Computational Linguistics (EACL), SigDial and ELSNET have endorsed the meeting. Submissions and further enquiries to: hccw at dcs.shef.ac.uk Yorick Wilks HCCW '2000 Department of Computer Science University of Sheffield 211 Portobello St., Sheffield S1 4DP UK phone: (44) 114 222 1814 fax: (44) 114 222 1810 email: hccw at dcs.shef.ac.uk www: http://www.dcs.shef.ac.uk/~yorick ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:09 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:09 +0200 Subject: Appel: DCAGRS 2000 Message-ID: From: Helmut Jurgensen Final Call: DCAGRS 2000 Descriptional Complexity of Automata, Grammars and Related Structures Please note that the submission deadline has changed to 15 April 2000. Submissions concerning the descriptional complexity of automata, grammars and related structures are invited for a workshop to be held in London, Ontario, on 27--29 July, 2000. Topics include, but are not limited to the following: -- various measures of descriptional complexity of automata, grammars and languages -- circuit complexity of Boolean functions and related measures -- succinctness of description of (finite) objects -- descriptional complexity in resource-bounded or structure-bounded environments -- structural complexity Papers on applications of such issues, for instance in the fields of software or hardware testing, systems modelling as well demonstrations of systems related to these issues are also welcome. DCAGRS 2000 will be part of a three-conference event, held at the University of Western Ontario in London, Ontario, Canada, in the week of July 24 to 29, 2000. -- CIAA 2000, the Conference on the Implementation and Application of Automata, held on 24--25 July. -- Half Century of Automata Theory, held on 26 July. -- DCAGRS 2000, held on 27--29 July. There will also be a workshop on coding theory held on 31 July and 1 August at the same location. For more information about these events visit any of the following www-sites: www.csd.uwo.ca/~ciaa2000 (CIAA 2000) www.cs.uni-potsdam.de/~dcagrs (DCAGRS 2000) www.cs.uni-potsdam.de/~dcagrs/triconf.html (Tri-Conference) www.cs.uni-potsdam.de/~dcagrs/codes.html (Coding Theory) www.csd.uwo.ca/~automata (Half Century) and follow the links from there. The DCAGRS 2000 deadlines are as follows: -- 15 April 2000, submission of papers -- 5 May 2000, notification of authors -- 1 May 2000, submission of demo proposals -- 1 July 2000, submission of final copy for pre-proceedings -- 27--29 July 2000, workshop Details regarding the submission procedures are available on the www at If you have difficulties accessing the www we can send you the information by email. In that case, please send your request to boldt at cs.uni-potsdam.de (Oliver Boldt) DCAGRS is sponsored by the IFIP WG 1.2 Conference chair for DCAGRS 2000: Helmut Jurgensen, helmut at uwo.ca ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:45 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:45 +0200 Subject: Appel: COLING-2000 Workshop Message-ID: From: Remi Zajac Call for submissions for the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems Centre Universitaire, Luxembourg, 5 August 2000 (see also this call at http://crl.nmsu.edu/Events/COLING00) Background The purpose of the workshop is to present the state-of-the-art on NLP toolsets and workbenches that can be used to develop multilingual and/or multi-applications NLP components and systems. Although technical presentations of particular toolsets are of interest, we would like to emphasize methodologies and practical experiences in building components or full applications using an NLP toolset. Combined demonstrations and paper presentations are strongly encouraged. Many toolsets have been developed to support the implementation of single NLP components (taggers, parsers, generators, dictionaries) or complete Natural Language Processing applications (Information Extraction systems, Machine Translation systems). These tools aim at facilitating and lowering the cost of building NLP systems. Since the tools themselves are often complex pieces of software, they require a significant amount of effort to be developed and maintained in the first place. Is this effort worth the trouble? It is to be noted that NLP toolsets have often been originally developed for implementing a single component or application. In this case, why not build the NLP system using a general programming language such as Lisp or Prolog? There can be at least two answers. First, for pure efficiency issues (speed and space), it is often preferable to build a parameterized algorithm operating on a uniform data structure (e.g., a phrase-structure parser). Second, it is harder, and often impossible, to develop, debug and maintain a large NLP system directly written in a general programming language. It has been the experience of many users that a given toolset is quite often unusable outside its environment: the toolset can be too restricted in its purpose (e.g. an MT toolset that cannot be used for building a grammar checker), too complex to use, or even too difficult to install. There have been, in particular in the US under the Tipster program, efforts to promote instead common architectures for a given set of applications (primarily IR and IE in Tipster; see also the Galaxy architecture of the DARPA Communicator project). Several software environments have been built around this flexible concept, which is closer to current trends in main stream software engineering. The workshop aims at providing a picture of the current problems faced by developers and users of toolsets, and future directions for the development and use of NLP toolsets. We encourage reports of actual experiences in the use of toolsets (complexity, training, learning curve, cost, benefits, user profiles) as well as presentation of toolsets concentrating on user issues (GUIs, methodologies, on-line help, etc.) and application development. Demonstrations are also welcome. Audience Researchers and practitioners in Language Engineering, users and developers of tools and toolsets. Issues Although individual tools (such as a POS taggers) have their use, they typically need to be integrated in a complete application (e.g. an IR system). Language Engineering issues in toolset and architectures include (in no particular order): Practical experience in the use of a toolset; Methodological issues associated to the use of a toolset; Benefits and deficiencies of toolsets; User (linguist/programmer) training and support; Adaptation of a tool (or toolset) to a new kind of application; Adaptation of a tool to a new language; Integration of a tool in an application; Architectures and support software; Reuse of data resources vs. processing components; NLP algorithmic libraries. Format of the Workshop The one-day workshop will include twelve presentation periods which will be divided into 20 minutes presentations followed by 10 minutes reserved for exchanges. We encourage the authors to focus on the salient points of their presentation and identify possible controversial positions. There will be ample time set aside for informal and panel discussions and audience participation. Please note that workshop participants are required to register at http://www.coling.org/reg.html. Deadlines 21 May 2000: Submission deadline. 11 June 2000: Notification to authors. 24 June 2000: Final camera-ready copy. 5 August 2000: COLING-2000 Workshop. Submission Format Send submissions of no more than 6 pages conforming to the COLING format (http://www.coling.org/format.html) to zajac at crl.nmsu.edu. We prefer electronic submissions using either PDF or Postscript. Final submissions can extend to 10 pages. Organizing Committee Rémi Zajac (Chair), CRL, New-Mexico State University, USA: zajac at crl.nmsu.edu. Jan Amtrup, CRL, New-Mexico State University, USA: jamtrup at crl.nmsu.edu. Stephan Busemann, DFKI, Saarbrucken: busemann at dfki.de. Hamish Cunningham, University of Sheffield: hamish at dcs.shef.ac.uk. Guenther Goerz, IMMD VIII, University of Erlangen: goerz at immd8.informatik.uni-erlangen.de. Gertjan van Noord, University of Groningen: vannoord at let.rug.nl. Fabio Pianesi, IRST, Trento: pianesi at irst.itc.it. Of Related Interest The Natural Language Software Registry at http://www.dfki.de/lt/registry/sections.html The Coling-200 Web Site at http://www.coling.org/ --- ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:51 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:51 +0200 Subject: Ecole: CLaRK'2000 Message-ID: From: Frank Richter Announcement: Summer School 2000 in Bulgaria - CLaRK'2000 The Tuebingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK) is inviting applications to a summer school in Sozopol, Bulgaria, this summer. Dates: August 27th - September 10th 2000 (days of arrival and departure) Place: Resort town of Sozopol (Black Sea), Bulgaria Language: English Participants: Participants should be doctoral students who research the interfaces between computer science, cognitive science, linguistics, mathematics and philosophy. In exceptional cases, postdoctoral researchers as well as outstanding students in the final year of masters level studies who intend to pursue a doctorate will also be considered. The summer school is limited to 25 students. Places are competitively allocated on the basis of the research interests of the participants and the perceived benefits to those interests of attending the summer school. Participants must be proficient in English. Stipends: Via the CLaRK Program, the Volkswagen-Foundation will provide stipends for up to 6 students from the countries of Central and Eastern Europe and 6 further students from Bulgaria. The stipends will be awarded on a competitive basis. The stipends will comprise travel costs (up to DEM 600), and room and board for the duration of the summer school. At the discretion of the CLaRK Program, the stipends may include additional support for travel costs above DEM 600. Costs: Participants who are not sponsored by a CLaRK stipend should anticipate approximately DEM 125 for room and board per day. Costs for transportation to and from the summer school are not included in this estimate. Applications: Applications with a completely filled in registration form (available from www.uni-tuebingen.de/IZ/application.rtf), a curriculum vita, and a short (maximum three pages) summary of relevant past and present research and education must be submitted to the Office of the International Centre at Tuebingen by 30th April 2000. Applicants should indicate whether they are applying for a CLaRK stipend. CLaRK stipend applications must include a letter of recommendation with their application. Internationales Zentrum fuer Wissenschaftliche Zusammenarbeit Universitaet Tuebingen Keplerstr. 17 D - 72074 Tuebingen Tel.: (0049) 7071 / 29 - 77352 or /29 - 74156 Fax: (0049) 7071 / 29 5989 e-mail: iz at uni-tuebingen.de WWW: www.uni-tuebingen.de/IZ/starte.html Content and Goals Computational linguistics and knowledge representation are two distinct disciplines that share a common concern with what knowledge is, how it is used, and how it is acquired. However, though knowledge representation and computational linguistics clearly address broadly similar research problems, research within each of these fields has hitherto been largely ignorant of research in the other. Moreover, the ignorance the two fields have of each other both fosters and is fostered by a wide gulf between the educations received by students of knowledge representation and students of computational linguistics. The goal of the summer school is to help bridge this gulf by introducing the summer school students to recent developments in the interdisciplinary field of computational linguistics and knowledge representation. The summer school will take the form of courses in various topics. The program provisionally includes courses in computational morphology, corpus linguistics, declarative knowledge representation, natural language semantics, Slavic syntax and psycholinguistics. Preliminary Course Program Erhard Hinrichs, Sandra Kuebler: Computational Tools for Corpus Linguistics Valia Kordoni/Frank Richter: A Comparison of LFG and HPSG Anna Kupsc: Slavic in HPSG Detmar Meurers: Introduction to HPSG Janina Rado: Introduction to Psycholinguistics Kiril Simov/Gergana Popova: Computational Morphology Kiril Simov/Atanas Kiryakov: Declarative Knowledge Representation Contact for further information: Kiril Ivanov Simov (Sofia): kivs at bgcict.acad.bg Frank Richter (Tuebingen): fr at sfs.nphil.uni-tuebingen.de WWW: http://www.sfs.nphil.uni-tuebingen.de/clark/ ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:52 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:52 +0200 Subject: Conf: Large Corpora and Annotation Standards Message-ID: From: "Nancy M. Ide" Large Corpora and Annotation Standards http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html Held in conjunction with ANLP/NAACL'00 Seattle, Washington 4 May 2000 1-6pm This meeting is intended to bring together researchers and developers from a variety of domains in text, speech, video, etc., to look broadly at the technical issues that bear on the development of software systems and standards for the annotation and exploitation of linguistic resources. The goal is to lay the groundwork for the definition of a data and system architecture to support corpus annotation and exploitation that can be widely adopted within the community. Among the issues to be addressed are: - layered data architectures - system architectures for distributed databases - support for plurality of annotation schemes - impact and use of XML/XSL - support for multimedia, including speech and video - tools for creation, annotation, query and access of corpora - mechanisms for linkage of annotation and primary data - applicability of semi-structured data models, search and query systems, etc. - evaluation/validation of systems and annotations The motivation for this meeting is the American National Corpus (ANC) effort, which should begin corpus creation within the year. We anticipate that the ANC will provide a significant resource for natural language processing, and we therefore seek to identify state-of-the-art methods for its creation, annotation, and exploitation. Also, as a national and freely available resource, the data and system architecture of the ANC is likely to become a de facto standard. We therefore hope to draw together leading researchers and developers to establish a basis for the design of a system to support the creation and use of the ANC. Provisional Program Overview of the American National Corpus Effort Nancy Ide and Catherine Macleod Searching Linguistically Annotated Corpora Chris Brew Considerations for Large Corpus Annotation: Intercoder Reliability Rebecca Bruce and Janyce Wiebe The XML Framework and Its Implications for Large Corpus Access Nancy Ide The ATLAS System John Henderson Annotation Standards and Their Impact on Large Corpus Development Nicoletta Calzolari A Framework for Multi-level Linguistic Annotation Patrice Lopez and Laurent Romary Discussion : Requirements for the ANC A related workshop will be held at the LREC conference on May 29-30, 2000. Se http://www.cs.vassar.edu/~ide/anc/lrec.html. Organizer: Nancy Ide Professor and Chair Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 914 437-5988 Fax: +1 914 437-7498 ide at cs.vassar.edu ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:22 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:22 +0200 Subject: Appel: TELRI Message-ID: From: "Patrick Ruch" Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:25 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:25 +0200 Subject: Conf: 2 ANLP/NAACL annoucements Message-ID: ____________________________________________________________________________ 1/ From: Priscilla Rasmussen Subject: Large Corpora & Annotation Standards at ANLP/NAACL2000 2/ From: radev at si.umich.edu Subject: ANLP/NAACL workshop on Automatic Summarization ____________________________________________________________________________ 1/ From: Priscilla Rasmussen Subject: Large Corpora & Annotation Standards at ANLP/NAACL2000 Large Corpora and Annotation Standards http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html Held in conjunction with ANLP/NAACL'00 Seattle, Washington 4 May 2000 1-6pm This meeting is intended to bring together researchers and developers from a variety of domains in text, speech, video, etc., to look broadly at the technical issues that bear on the development of software systems and standards for the annotation and exploitation of linguistic resources. The goal is to lay the groundwork for the definition of a data and system architecture to support corpus annotation and exploitation that can be widely adopted within the community. Among the issues to be addressed are: - layered data architectures - system architectures for distributed databases - support for plurality of annotation schemes - impact and use of XML/XSL - support for multimedia, including speech and video - tools for creation, annotation, query and access - of corpora - mechanisms for linkage of annotation and primary data - applicability of semi-structured data models, - search and query systems, etc. - evaluation/validation of systems and annotations The motivation for this meeting is the American National Corpus (ANC) effort, which should begin corpus creation within the year. We anticipate that the ANC will provide a significant resource for natural language processing, and we therefore seek to identify state-of-the-art methods for its creation, annotation, and exploitation. Also, as a national and freely available resource, the data and system architecture of the ANC is likely to become a de facto standard. We therefore hope to draw together leading researchers and developers to establish a basis for the design of a system to support the creation and use of the ANC. Provisional Program Overview of the American National Corpus Effort Nancy Ide and Catherine Macleod Searching Linguistically Annotated Corpora Chris Brew Considerations for Large Corpus Annotation: Intercoder Reliability Rebecca Bruce and Janyce Wiebe The XML Framework and Its Implications for Large Corpus Access Nancy Ide The ATLAS System John Henderson Annotation Standards and Their Impact on Large Corpus Development Nicoletta Calzolari A Framework for Multi-level Linguistic Annotation Patrice Lopez and Laurent Romary Discussion : Requirements for the ANC A related workshop will be held at the LREC conference on May 29-30, 2000. See http://www.cs.vassar.edu/~ide/anc/lrec.html. Organizer: Nancy Ide Professor and Chair Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 914 437-5988 Fax: +1 914 437-7498 ide at cs.vassar.edu ____________________________________________________________________________ 2/ From: radev at si.umich.edu Subject: ANLP/NAACL workshop on Automatic Summarization CALL FOR PARTICIPATION ANLP/NAACL Workshop on Automatic Summarization Sunday, April 30, 2000 Westin Hotel Seattle, WA 48103 REGISTRATION (until April 20) http://www.gte.com/anlp-naacl2000 SCHEDULE 09:10-09:25 Introduction 09:25-10:15 Session on Content Selection 09:25-09:50 Concept Identification and Presentation in the Context of Technical Text Summarization Horacio Saggion and Guy Lapalme, DIRO-Universite de Montreal 09:50-10:15 Mining Discourse Markers for Chinese Textual Summarization Samuel W. K. Chan, Tom B. Y. Lai, W. J. Gao, and Benjamin K. Tsou, City University of Hong Kong 10:15-10:40 Session on Visualization 10:15-10:40 Multi-document Summarization by Visualizing Topical Content Rie Kubota Ando, Branimir K. Boguraev, Roy J. Byrd, and Mary S. Neff, Cornell University, and IBM Research 10:40-11:05 Coffee Break (provided) 11:05-12:20 Session on Multi-Document Summarization 11:05-11:30 Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies Dragomir R. Radev, Hongyan Jing, Margo Budzikowska, University of Michigan, Columbia University, and IBM Research 11:30-11:55 Extracting Key Paragraph based on Topic and Event Detection - Towards Multi-Document Summarization Fumiyo Fukumoto and Yoshimi Suzuki, Yamanashi University 11:55-12:20 Multi-Document Summarization By Sentence Extraction Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz, Carnegie Mellon University and Just Research 12:20-01:50 Lunch Break (on your own) 01:50-03:05 Session on Evaluation 01:50-02:15 A Text Summarizer in Use: Lessons Learned from Real World Deployment and Evaluation Mary Ellen Okurowski, Harold Wilson, Joacquin Urbina, Tony Taylor, Ruth Colvin Clark, and Frank Krapcho, Department of Defense, SRA Corp, Clark Training & Consulting, and Kathpal Technologies Inc. 02:15-02:40 Evaluation of Phrase-representation Summarization based on Information Retrieval Task Mamiko Oka and Yoshihiro Ueda, Fuji Xerox Co., Ltd. 02:40-03:05 A Comparison of Rankings Produced by Summarization Evaluation Measures Robert L.Donaway, Kevin W. Drummey, and Laura A. Mather, Department of Defense and Britannica.com, Inc. 03:05-03:30 Coffee Break (provided) 03:30-04:30 Panel on "Language Modeling in Text Summarization" 04:30-04:55 Session on Multimedia Summarization 04:30-04:55 Using Summarization for Automatic Briefing Generation Inderjeet Mani, Kristian Concepcion, and Linda van Guilder, MITRE Corporation 04:55-06:00 Panel on "Summarization: Industry Perspectives" ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:27 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:27 +0200 Subject: Conf: RIAO-2000 Message-ID: From: "Karim Chibout" RIAO 2000 6th Conference on "Content-Based Multimedia Information Access" College de France (Paris, France) April 12-14, 2000 _______________ Final Announcement _______________ Organized by: C.I.D. (France) and C.A.S.I.S. (USA) Under the sponsorship of the European Commission, the French Ministry of Education, Research and Technology, the DGA, the CEA, ELRA and ELSNET With the collaboration of AII, ASIS, ESCA and AUF/Francil _______________ For the Final Conference Program and Registration, please visit the Web site: http://host.limsi.fr/RIAO ______________ The theme of the conference is "Content-Based Multimedia Information Access". The conference scope will range from the traditional processing of text documents to the rapidly growing field of automatic indexing and retrieval of images and speech and, more generally, to all processing of audio-visual and multimedia information on various distribution venues, including the Net. The conference is of interest for several scientific communities, including Information Retrieval, Natural Language Processing, Spoken Language Processing, Computer Vision, Human-Computer Interaction and Digital Libraries. RIAO 2000 will, thereby, serve as a forum for cross-discipline initiatives and innovative applications. RIAO 2000 will present recent scientific progress, demonstrations of prototypes resulting from this research as well as the most innovative products now appearing on the market. The Conference Advance Program is highlighted by contributions of authors from 26 countries. The program includes 2 invited speakers, 3 panel sessions, 3 plenary sessions, 8 poster sessions and 16 oral sessions. Among all sessions are 145 papers (75 oral and 70 poster presentations), providing a unique opportunity to present and discuss in depth the state-of-the-art in this rapidly growing scientific and technological field. There will also be many innovative application demonstrations presented by companies from different countries. The application committee has already selected about 20 of them covering various applications such as crosslingual English-Arabic Internet search, recognition of printed and handwritten texts, television archives retrieval, sign language indexing, machine translation, etc. For more information on the program, conference location and registration, please visit the Web site : http://host.limsi.fr/RIAO or contact us at: - For all scientific matters: riao2000 at limsi.fr - For all organizational, technical and practical matters: cidcol at club-internet.fr ------------------------------------- Joseph MARIANI LIMSI-CNRS BP 133 91403 Orsay Cedex (France) Tél.: (33/0) 1 69 85 80 85 Fax: (33/0) 1 69 85 80 88 Email: mariani at limsi.fr Web: http://www.limsi.fr/ **************************************** Karim Chibout FRANCIL LIMSI-CNRS B.P. 133 91403 Orsay Cedex FRANCE telephone: (+33/0) 1.69.85.80.66 telecopie: (+33/0) 1.69.85.80.88 courriel: chibout at limsi.fr http://www.limsi.fr/Individu/chibout/ ******************************************** ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:54:09 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:54:09 +0200 Subject: Jobs: 5 Offers Message-ID: ____________________________________________________________________________ 1/ From: Philippe Monnier Subject: Appel a Candidature 2/ From: MIT2USA at aol.com Subject: MIT2 / CreoleTrans Joint Press Release 3/ From: Naomi Hallan Subject: Chemnitz University 4/ From: Anna Bjarnestam Subject: Linguistic Programmer at Getty Images, Washington USA 5/ From: Kathleen Black Subject: Comp Linguist at Cycorp, Inc., Austin Texas USA ____________________________________________________________________________ 1/ From: Philippe Monnier Subject: Appel a Candidature POSTE : RESPONSABLE DE LA RECHERCHE Dans le cadre d'un projet d'envergure d'innovation technologique soutenu par l'ANVAR, la société STARTEM, spécialisée dans la gestion de l'information internationale et multimédia recherche son responsable de la recherche et du développement. Il travaillera dans un cadre pluridisciplinaire (informatique, linguistique, ergonomie) à l'élaboration d'une chaîne de traitement de l'information intégrant des NTIC. Il exercera son activité dans le cadre d'une équipe de projet transversale ayant pour objectif l'augmentation de la productivité. Dans le cadre de la chaîne de traitement de l'information : - Il accompagnera la société dans le choix des solutions technologiques et assurera le suivi de leur mise en place. - Il coordonnera l'intégration des modules de TALN plurilingue (catégorisation des textes ; extraction d'informations ; génération automatique de textes). - Il sera responsable des argumentaires portant sur les choix technologiques de la société à destination des organismes institutionnels susceptibles d'être intéressés par le projet. - Il coordonnera les différentes collaborations avec des laboratoires de recherche et encadrera les stagiaires et postdoc travaillant sur le projet. Profil : - Formation : docteur ou ingenieur experimenté - Domaine d'expertise : Informatique, TALN, linguistique ; - Connaissances informatiques indispensables : Java; Perl; C++; XML/SGML. - Une expérience en entreprise est necessaire, - Une experience dans des projets europeens serait un plus - Une formation complementaire en gestion de projet serait un atout. - Anglais courrant necessaire (oral / ecrit). Procédure : Envoyer CV et lettre de motivation à Marion Denneulin Email : mdenneulin at cmc.fr STARTEM 60 rue de Ponthieu 75008 Paris ____________________________________________________________________________ 2/ From: MIT2USA at aol.com Subject: MIT2 / CreoleTrans Joint Press Release Strategic Partnering for Haitian Creole Translation Services Mason Integrated Technologies Ltd (MIT2), a software developer and service provider specialized in language processing solutions for Creole languages, and CreoleTrans, a Haitian Creole language translation service provider, announce the forming of a strategic partnership to co-market and cross-sell their Workflow management technologies and translation services in order to expand functionality and effectiveness of both companies. MIT2, creator of the range of CreoleScan(tm) OCR and CreoleConvert(tm) orthography conversion software programs, provides written language stabilization solutions and productivity tools for processing texts in Creole and other minority and vernacular languages. CreoleTrans is comprised of an experienced team of Haitian Creole translators and editors and has a broad customer base including publishers, universities, schools and education systems, and government agencies. CreoleTrans would be the first Haitian Creole translation provider to use MIT2's software tools for producing and validating Creole (source or target language) texts. For more information, please contact: Mason Integrated Technologies Ltd P.O. Box 181015, Boston, Massachusetts 02118 USA Tel: (+1) 617 247-8885, Fax: (+1) 617 262-8923 E-mail: mit2usa at aol.com Web: http://hometown.aol.com/mit2usa/Index2.html CreoleTrans 470 NE 210 Circle Terrace #203, Miami, FL 33179 USA Tel: (+1) 305 770-9252, Fax: (+1) 305 690-5933 E-mail: info at creoletrans.com Web: http://www.creoletrans.com/ ******* Mason Integrated Technologies Ltd P.O. Box 181015 Boston, MA 02118 USA (617) 247-8885 (office & answering machine) (617) 262-8923 (FAX) MIT2USA at aol.com (e-mail) Mason Integrated Technologies Ltd Home Page: http://hometown.aol.com/mit2usa/Index2.html Orthographically Converted HC Texts Download Site: http://hometown.aol.com/mit2haiti/Index4.html Meet Marilyn Mason: http://hometown.aol.com/marilinc/Index3.html MIT2 Job Opportunities http://hometown.aol.com/mit2usa/JobOpps.html ____________________________________________________________________________ 3/ From: Naomi Hallan Subject: Chemnitz University WANTED: Graduate (Languages/Linguistics/Teaching/...) with good Internet-computing skill The English Linguistics department at the Chemnitz University of Technology is looking for someone to join an on-going research project. The post would be initially for 18 months, with a possibility of an extension if there is a further phase of the project, starting on 1st June 2000 or as soon as possible thereafter. Payment is on the Bat IIa (Ost) scale, with the salary level dependent on age and experience. The project, "Learner Behaviour in the Internet Grammar", http://www.tu-chemnitz.de/InternetGrammar/, is part of an inter-disciplinary research group "New media in everyday life", funded by the German Research Association (Deutsche Forschungsgesellschaft). What is the Internet Grammar? We are building a grammar-learning environment, accessible using a web-browser, for advanced learners of English. We are using material from a variety of corpora, including our own English-German Translation Corpus, to provide examples and material for exercises wherever possible. The software infra-structure which we have designed makes it possible to track user behaviour in great detail, and we are hoping to discover how different types of learners interact with language teaching material presented on the Web. What would you do? We need someone to take over the care of our software infra-structure and help us to develop it further. You would work with the other members of the team, who are responsible for designing and writing content and, in part, for preparing it for insertion in the grammar. Your tasks would be: (a) to maintain the existing structures, which involve cgi scripts, corpus search facilities and interactive animations, as well as the more conventional elements of a web-site; (b) to assist with the extraction and analysis of learner data; (d) to help extend the functionality of the grammar, both for users and researchers, through the development of new features and the improvement of existing ones. Qualifications: You will have a degree in a relevant subject and be able to show possession of the necessary software skills - such as cgi and perl or javascript, in addition to html - or a willingness to acquire them very rapidly. You should also have an interest in the use of the Web and corpora for language teaching and learning. You should enjoy working in a team and value the opportunity to help with the further development of our project. The working language of the project is English, so fluency would be an advantage. Working in Chemnitz? Apart from the satisfaction of helping to see an exciting research project to its completion, you would have the advantage of a stimulating university environment in a city which is growing and changing every day. Low rents for well-equipped apartments in elegant newly restored houses; leafy suburbs, beautiful countryside; a varied and vigorous cultural life (opera, world-class art exhibitions, cabaret . . . ); all these in one of the most enterprising and economically active cities in the "new" German states. Please send a CV and covering letter as soon as possible to: Prof. Dr. Josef Schmied Englische Sprachwissenschaft Technische Universitt Chemnitz D-09107 Chemnitz, Germany. e-mail to: realcentre at phil.tu-chemnitz.de For more information about the project: http://www.tu-chemnitz.de/InternetGrammar/ ____________________________________________________________________________ 4/ From: Anna Bjarnestam Subject: Linguistic Programmer at Getty Images, Washington USA Rank of Job: Full Time Permanent Areas Required: Linguistic Programmer Other Desired Areas: Technology University or Organization: Getty Images Department: Getty Technology Group State or Province: Washington Country: USA Final Date of Application: 05/30/2000 Contact: Anna Bjarnestam anna.bjarnestam at gettyimages.com Address for Applications: Getty Images, 701, N 34th Street, Suite 400 Seattle WA 98103 USA Job - Linguistic Programmer Responsibilities - Getty Images have vast sources of text attached to imagery that need to be indexed automatically in some manner for searching and retrieval purposes. Primary responsibilities involve development of a semantic or syntactic tagger for natural language English. The tagger should be based on a controlled vocabulary developed and currently in use at Getty Images. The most important aspect of the work is the programming of these NLP tools, rather than finding linguistic solutions for functionality designs etc. Other job tasks involve metadata integration projects, various smaller NLP tool developments and machine-readable vocabulary development. This work is mainly for the creative professional (gettyone) and the editorial Getty Images channels (gettysource), see http://www.gettyimages.com Qualifications - - Strong programming skills (knowledge of C++, Perl or other) - Some knowledge of grammatical theories is preferred - Some understanding of NL parsing theory (which may include statistical and/or corpus-based parsing methods, tagger development) - Experience in computational lexicography or computational linguistics and online dictionary development andawareness of current NLP technology and available vocabularies - A degree in linguistics or computer science or a closely related discipline is preferred. ____________________________________________________________________________ 5/ From: Kathleen Black Subject: Comp Linguist at Cycorp, Inc., Austin Texas USA Rank of Job: -- Areas Required: -- Other Desired Areas: -- University or Organization: Cycorp, Inc. Department: Natural Language Development State or Province: TX Country: USA Final Date of Application: none Contact: Kathleen Black kat at cyc.com Address for Applications: 3721 Executive Center Drive, Suite 100 Austin TX 78731 USA Cycorp (http://www.cyc.com/) has begun to harness the power of its Cyc(TM) common sense knowledge base and reasoning system to do semantic and pragmatic disambiguation of English. Currently we are working on new and exciting clarification dialogue interfaces for Cyc itself and for Cyc-based applications. These include applications for smart Web searching, question and answer dialogues, and speech understanding, to name just a few. Join the team building this one-of-a-kind interactive dialogue front end. You will create formal representations of natural language expressions and phenomena, as well as develop applications to exploit such representations. Candidates for these positions must be familiar with formal logic, and have sound fundamentals in English usage, syntax, and semantics. In addition, one or more of the following would be a plus: Knowledge of discourse structure, pragmatics, and dialog modeling Experience with the influence of semantic distinctions on syntax Familiarity with formal semantic analysis Facility with knowledge representation and other AI tools and techniques Knowledge of constraint-based grammatical theories Understanding of NL parsing theory (statistical, corpus-based, etc.) Experience in applying that knowledge (computational NLU systems) Experience in computational lexicography Experience in natural language generation Knowledge of NL interface design and human cognitive considerations Programming skills, especially in Lisp or Scheme General Information: All technical positions at Cycorp involve working with the Cyc(TM) technology -- an immense, broad, multi-contextual knowledge base and efficient inference system which our group has developed over the last 16 years and 400 person-years. The Cyc knowledge base, spanning fundamental "consensus" human knowledge, enables a multitude of knowledge-intensive products and services which will revolutionize the way in which people use and interact with computers: semantic information retrieval, consistency-checking of structured information, deductive integration of heterogeneous data bases, natural language interfaces able to cope with realistic levels of ambiguity/terseness/contextualization, and many more. Cycorp is located in Austin, TX. We are an equal opportunity employer. For more information about employment at Cycorp, visit our website at http://www.cyc.com/employment.html For immediate consideration, please send your resume and a cover letter to Kathleen Black at the following address: Cycorp, Inc. 3721 Executive Center Drive, Suite 100 Austin TX 78731 Internet: info at cyc.com Telephone: +1 (512) 342-4000 Fax: +1 (512) 342-4040 No person shall be excluded from consideration for recruitment, selection, appointment, training, promotion, retention, or any other personnel action, or be denied any benefits or participation in any activities on the grounds of race, religion, color, national origin, sex, handicap or age. Cycorp will hire only persons authorized to work in the United States and will verify identity and eligibility for employment, and complete form I-9 for all new employees within three (3) business days of the date of hire. ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pz at biomath.jussieu.fr Wed Apr 19 11:44:32 2000 From: pz at biomath.jussieu.fr (Pierre Zweigenbaum) Date: Wed, 19 Apr 2000 12:44:32 +0100 Subject: R: Debugging computational grammars Message-ID: Date: Tue, 18 Apr 2000 08:42:50 +0100 From: Christian Boitet Message-Id: Dear colleague, 18/4/2000 Sorry I left this unanswered for so long. I Ariane-G5 (presented at the last ATALA WS on tools, some more on my web site & in the literature), we have developed quite powerful tools to debug our grammars, which are actually made of modules consisting of transformation rules. 1) Only a modular organisation of the rule systems allows to do that efficiently. In ROBRA, for instance, we have 3 levels: transformational system, grammar, rule. 2) Provide for a variety of traces, at each of these levels, of course in terms of the external specialized language (SLLP) used by the linguist-developer. 3) If you develop really large applications, a specification level such as that of Vauquois' "static grammars" (ref. in Vauquois Analects 1989 & in Zarin's articles) is most crucial to create and maintain your computational grammars in an orderly way. As an anecdote, I was personnally able to debug a particular point of an English-Thai prototype without knowing anything about the computational grammars developed, or even about Thai. I was with Pr. Udom Warutamasikkhadit who told me something was wrong in a translation. I ran the sentence in question, tracing the output trees after AS (structural analysis), before GS (structural + syntactic generation), & after it. Looking at the static grammars "boards" for Thai, he told me the input to generation was OK, the output not. I then went to the ROBRA module & 3-4 rules indicated in the board as implementing the construct in question, and easily detected that one rule schema did not correspond to the board. I corrected it, reran, it worked. All this in less than 15 minutes, and the computational grammars in question represented (for analysis, transfer & generation) several hundred pages in source form (including of course comments). 4) One originality in ROBRA (our language for writing tree transformational systems) is that tracing and stepping are modular and graded: - one may trace or not each main step of the automaton: prefiltering, labelling (= finding all possible rule occurrences), choice (conflict resolution and production of the set of rule occurrences to apply in parallel to the tree -- which may represent several paragraphs and contain thausend of nodes), and transformation proper. - there are 4 levels of global and dynamic) tracing, 1-4, and each trace point also has a (local and static) tracing grade. Whether the trace is produced depends on whether, at a given point, the sum exceeds 4. In this way, one can step and see more or less details. - in the same spirit, any active tree (contained in the stack) can be visualize in 4 or 5 geometric forms, with only the lexical units, or with the complete decoration of each node. -- in a future implementation, we should add a graphical, mouse sensitive interface, allowing to examine individual nodes by clicking them. This domain is still full of interesting possibilities. Yours very sincerely, Ch.Boitet >Date: Thu, 23 Sep 1999 10:23:59 +0200 (MET DST) >From: Alberto Lavelli >Message-Id: <199909230824.KAA16161 at ecate.itc.it> > > >Dear colleagues, > >I'm looking for references to the problem of developing and debugging >computational grammars for natural languages. I'm particularly >interested in tools and approaches used in debugging grammars >(particularly in their use when dealing with relatively large >hand-written grammars). In the computational systems I'm aware of, >usually there is only a limited (and standard) set of debugging tools: >tracers, steppers, chart browsers. > >Furthermore, does anybody know any extensive study on the most >suitable strategies/tools to cope with the writing/testing/debugging >cycle (always with a particular emphasis on debugging)? > >I know that there have been hints to this problem in related areas >(e.g., the EU projects TSNLP and DiET, some papers at the ACL-EACL97 >workshop on Computational Environments for Grammar Development and >Linguistic Engineering) but it seems to me that this topic has so far >received little attention. But perhaps I'm missing some relevant >contributions and so I'm asking for your help. > >Apart from references to relevant stuff, I'm also interested in your >general opinion on the issue. Is this (alleged) lack of interest an >indication of the fact that such issue is in your opinion not >particularly relevant? > > >I'll post a summary if I receive enough responses > > >best > alberto > > >ps: I have sent this message to several mailing lists. I apologize if >you receive it more than once. ------------------------------------------------------------------------- Christian Boitet (Pr. Universite' Joseph Fourier) Tel: +33.4-7651-4355/4817 GETA, CLIPS, IMAG-campus, BP53 Fax: +33.4-7651-4405 385, rue de la Bibliothe`que Mel: Christian.Boitet at imag.fr 38041 Grenoble Cedex 9, France Mobile: +33-(0)6-6005-1969 http://www-clips.imag.fr/geta/christian.boitet ------------------------------------------------------------------------- Projet C-STAR (http://www.c-star.org/) et projet européen Nespole (http://nespole.itc.it) de traduction de parole Projet UNL de communication et recherche d'information multilingue sur le réseau http://www.unl.ias.unu.edu ou http://www.unl.org ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:08:18 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:08:18 +0200 Subject: Appel: Linguistic Exploration Workshop Message-ID: From: Steven Bird First Announcement: LINGUISTIC EXPLORATION WORKSHOP 12-15 December 2000 Institute for Research in Cognitive Science University of Pennsylvania, Philadelphia Organized by Steven Bird (U Penn) and Gary Simons (SIL) http://www.ldc.upenn.edu/exploration/ Linguistic Exploration is a theme which unites linguists and computational linguists who are engaged in empirical research on large datasets through the combination of traditional field methods with new technologies for representing, investigating and disseminating linguistic data. The languages under study may range from the undescribed to the well-studied, and the "fieldworker" may operate in a village or a laboratory. The focus is language documentation, coupled with an exploratory mode of research where elicitation, analysis and hypothesis-testing form a tight loop. At the January LSA in Chicago, a one day workshop was held on computational infrastructure for linguistic fieldwork. Full materials from this workshop, including abstracts, presentations and audio recordings, are online at http://www.ldc.upenn.edu/exploration/LSA/. A second workshop will be held in Philadelphia in December 2000. The goal of this workshop is to align the many parallel efforts in this area, and to establish a research agenda which will provide the infrastructure for a new generation of computational tools. Please bookmark http://www.ldc.upenn.edu/exploration/ and join the mailing list to be sure of receiving future announcements. -- Steven Bird sb at ldc.upenn.edu http://www.ldc.upenn.edu/sb ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:08:19 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:08:19 +0200 Subject: Conf: INLG'2000 Message-ID: From: "International Natural Language Generation Conference-Dr. Elhadad" International Natural Language Generation INLG'2000 Mitzpe Ramon, Israel Workshops: 12 June 2000 Main conference: 13-16 June 2000 First Call For Participation The First International Natural Language Generation Conference (INLG'2000) will be held June 12 to 16, 2000 in Mitzpe Ramon, Israel. This conference continues in the tradition of the nine biennial workshops on natural language generation that have been held from 1980 to 1998. INLG'2000 will offer the opportunity to a larger audience to participate in the main meeting of researchers in the field. Following the tradition of previous INLG meetings, the conference will be held in an isolated and stunning natural environment: the Ramon Inn hotel, in Mitzpe Ramon, Israel. The hotel is located on the edge of the Ramon Crater, in the middle of the Negev Desert. Conference Main topics: * Generation and summarization * Multimodal and multimedia generation * Multilingual generation * Concept to speech, models of intonation * Strategic generation for text and dialogue * Text planning, discourse models, argumentation strategies, content selection and organization * Tactical generation, formalisms and models of grammar, sentence aggregation, lexical choice * Architecture of generators * Knowledge acquisition and resources for generation and summarization * User-customized generation and summarization * Psychological modeling of discourse production * Learning methods for generation * Evaluation methodologies for generation and summarization * Applications of: generation, concept-to-speech, information extraction, information retrieval techniques to summarization, report generation, explanation. The conference is organized in four tracks: 1. Main session 2. Student session 3. Workshops 4. Special session on evaluation in generation Registration form for the main conference is available at our homepage: http://www.cs.bgu.ac.il/~nlg2000 Registration form for the workshops will be available soon, as well as the full program. Registration will be accepted until May 15th. After this date, a late registration fee will be required. ------------------------------------------------------------------------ Programme Committee * Michael Elhadad, Ben Gurion University, Israel (Chair) * Stephan Buseman, DFKI, Germany * Graeme Hirst, University of Toronto, Canada * James Lester, North Carolina State University, USA * Inderjeet Mani, The MITRE Corporation, USA * Kathy McCoy, University of Delaware, USA * David McDonald, Gensym Corp, USA * Dragomir Radev, University of Michigan, USA * Jacques Robin, Federal University of Pernambuco, Brazil * Donia Scott, University of Brighton, UK * Manfred Stede, Technical University, Berlin, Germany * Matthew Stone, Rutgers University, USA * Ingrid Zukerman, Monash University, Australia Student Session * Irene Langkilde, University of South California - ISI * Charles Brendan Callaway, North Carolina State University * James Shaw, Columbia University Special Session on Evaluation * Inderjeet Mani, The MITRE Corporation Equipment Availability Presenters will have available an overhead projector, a slide projector, a data projector (Barco) which will display from laptops, and a VHS (PAL) videocassette recorder. NTSC format may be available; if you anticipate needing NTSC, please note this information in your proposal. Requests for other presentation equipment will be considered by the local organizers; requests for special equipment should be directed to the local organizers no later than May 15, 2000. ------------------------------------------------------------------------ Local Arrangements * Michael Elhadad elhadad at cs.bgu.ac.il * Yael Dahan Netzer yaeln at cs.bgu.ac.il Dept. of Computer Science Ben Gurion University P.O.Box 643 Beer Sheva 84105 Israel ------------------------------------------------------------------------ ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:07:58 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:07:58 +0200 Subject: Appel: TALN-2000 (Deadline extension) Message-ID: From: TALN2000 ********************************************************************** * ##### ### ### ### * * ##### ## # # # # # # # # # # # * * # # # # ## # # # # # # # # * * # # # # # # # ##### # # # # # # * * # ###### # # # # # # # # # # # * * # # # # # ## # # # # # # # * * # # # ###### # # ####### ### ### ### * * * * * * TALN 2000 * * Traitement Automatique du Langage Naturel * * * * École Polytechnique Fédérale de Lausanne * * du 16 au 18 octobre 2000 * * * * http://liawww.epfl.ch/taln2000/ * * * ********************************************************************** (version française ci-dessous) LAST CALL FOR PAPERS DEADLINE EXTENSION TALN 2000 Swiss Federal Institute of Technology Lausanne, Switzerland 16-18 October 2000 CALENDAR submission deadline : May 5th, 2000 notification to authors : June 23th, 2000 final version due (camera-ready): August 4th, 2000 conference: 16-18 October, 2000 Jointly organised by the Swiss Federal Institute of Technology (Lausanne) and the University of Geneva the Seventh Conference on Natural Language Processing (TALN 2000) will be held at the Swiss Federal Institute of Technology, (Lausanne, Switzerland), on October 16-18, 2000. The conference includes paper presentations, invited speakers, tutorials and software demonstrations. The official conference languages are French and English. TALN 2000 is organised in collaboration with ATALA (Association pour le Traitement Automatique des LAngues) and will be held jointly with the young researcher conference RECITAL 2000 (a separate call for papers will follow). TOPICS Papers are invited for 30-minutes talks (including questions) in all areas of NLP, including (but not limited to): lexicon morphology syntax semantics pragmatics discourse parsing generation abstraction/summarisation dialogue translation logical,symbolical and statistical approaches mathematical linguistics TALN 2000 also welcomes contributions in fields for which NLP plays an important role, as long as these contributions emphasise their NLP dimension : speech processing text processing cognition terminology, knowledge acquisition information retrieval documentary research corpus-based linguistics mathematical linguistics management and acquisition of linguistic resources computer assisted learning NLP tools for linguistic modelization TALN 2000, also welcomes submissions focusing on NLP applications that have been implemented, tested and evaluated and emphasising the scientific aspects and conclusions drawn. Software demonstrations can be proposed, either independently or in connection with a paper proposal. Specific sessions for the demos will be scheduled in the program of the conference. The program committee will select 2 papers among the accepted papers for publication (in an extended version) in the journal "Traitement Automatique des Langues" (t.a.l.). For the journal, these papers will have the status "accepted, subject to modifications", the modifications being the formatting according to the style of the journal. SELECTION Authors are invited to submit original, previously unpublished work. Submissions will be reviewed by at least 2 specialists of the domain. Decisions will be based on the following criteria : - importance and originality of the paper - soundness of the scientific and technical content - comparison of the results obtained with other relevant work - clarity - relevance to the topics of the conference Accepted papers will be published in the proceedings of the conference. SUBMISSION PROCEDURE The maximum length for papers is 10 pages, in Times 12 (approx. 3000 words), single spaced, including figures, examples and references. The maximum length for demo proposals is 3 pages. A LaTeX style file and a Word template will be available on the web site of the conference (http://liawww.epfl.ch/taln2000). Electronic submissions must reach the organising committee before May 5th, 2000 to the latest, at the following address: taln2000 at latl.unige.ch If electronic submission is not possible, 3 hard-copies of the paper must reach the organising committee before April 21st 2000, at the following address: Eric Wehrli - TALN 2000 Dépt. de linguistique - LATL Université de Genève 2, rue de Candolle CH-1211 Genève 4 Suisse File format for electronic submissions: Authors should send their submission as a file attached to an e-mail, with the subject field "TALN submission" and containing the following information: submission title, first author's name, affiliation, postal address, e-mail address, phone and fax number. The submissions are ANONYMOUS, and should therefore not include the author's name, or any self-reference. One of the following formats MUST be used: - self-contained LaTeX source (including non standard styles) AND PostScript version. - RTF (Word) document AND PostScript or PDF version. All the PostScript versions must be for A4 paper and not US letter. PRACTICAL INFORMATIONS: Practical information will be detailed shortly on the conference WEB page site (http://liawww.epfl.ch/taln2000/). ---------------------------------------------------------------------- DERNIER APPEL À COMMUNICATIONS EXTENSION DE LA DATE LIMITE DE SOUMISSION CALENDRIER Date limite de soumission : 05 mai 2000 Notification aux auteurs : 23 juin 2000 Version finale (prêt-à-clicher): 4 août 2000 Conférence : 16-18 octobre 2000 Conjointement organisée par l'École Polytechnique Fédérale de Lausanne et l'Université de Genève, la septième édition de la conférence sur le Traitement Automatique des Langues Naturelles (TALN 2000) se tiendra, du 16 au 18 octobre 2000, à l'École Polytechnique Fédérale de Lausanne, Suisse. La conférence comprendra des communications scientifiques, des conférences invitées, des séances de démonstration ainsi que des tutoriels. Les langues officielles de la conférence sont le français et l'anglais. TALN 2000 est organisée sous l'égide de l'ATALA (Association pour le Traitement Automatique des LAngues) et se tiendra conjointement à la conférence pour jeunes chercheurs RECITAL 2000 (appel à communications a paraître séparement). THÈMES Les communications, d'une durée de trente minutes, questions comprises, pourront porter sur tous les thèmes habituels du TALN, incluant, de façon non limitative: lexique morphologie syntaxe sémantique pragmatique discours analyse génération résumé dialogue traduction automatique approches logiques, symboliques et statistiques TALN 2000 souhaite également accueillir des travaux de domaines proches dans lesquels le TALN joue un rôle important, dans la mesure où l'accent est mis sur la composante TALN : traitement de la parole (prosodie, linguistique, pragmatique) traitement de l'écrit aspects cognitifs terminologie, acquisition de connaissances à partir de textes extraction d'information recherche documentaire linguistique de corpus linguistique mathématique utilisation d'outils de TALN pour la modélisation linguistique enseignement assisté, enseignement des langues. Sont aussi attendus des travaux sur des applications du TALN, implémentées et évaluées, faisant ressortir leurs aspects scientifiques et les enseignements tirés. Des démonstrations de systèmes pourront être proposées, indépendamment ou en complément d'articles scientifiques. L'emploi du temps de la conférence comprendra une session pour ces démonstrations. Le comité de programme sélectionnera parmi les communications acceptées deux articles pour publication (dans une version étendue) dans la revue Traitement Automatique des Langues (t.a.l.). Ces articles seront considérés par la revue comme "acceptés sous réserve de modification", la modification étant la mise au format de la revue. CRITÈRES DE SÉLECTION Les auteurs sont invités à soumettre des travaux de recherche originaux, n'ayant pas fait l'objet de publications antérieures. Les soumissions seront examinées par au moins deux spécialistes du domaine. Seront considérées en particulier : - l'importance et l'originalité de la contribution, - la correction du contenu scientifique et technique, - la discussion critique des résultats, en particulier par rapport aux autres travaux du domaine, - la situation des travaux dans le contexte de la recherche internationale, - l'organisation et la clarté de la présentation, - l'adéquation aux thèmes de la conférence. Les articles sélectionnés seront publiés dans les actes de la conférence. MODALITÉS DE SOUMISSION Les articles soumis ne devront pas dépasser 10 pages en Times 12, espacement simple, soit environ 3000 mots, figures, exemples et références compris. Les propositions de démonstrations ne devront pas dépasser 3 pages. Une feuille de style LaTeX et un modèle Word seront disponibles sur le site web de la conférence (http://liawww.epfl.ch/taln2000/). Les articles devront parvenir au comité d'organisation avant le 05 mai 2000, sous forme électronique, à l'adresse suivante : taln2000 at latl.unige.ch En cas d'impossibilité d'envoi par courrier électronique, une soumission "papier" pourra être admise. 3 exemplaires papier de la contribution devront être envoyés à l'adresse suivante: Eric Wehrli - TALN 2000 Dépt. de linguistique - LATL Université de Genève 2, rue de Candolle CH-1211 Genève 4 Suisse Format pour les soumissions : Les auteurs devront envoyer leur soumission sous la forme d'un document attaché à un courrier électronique ayant pour titre "TALN submission" et contenant le titre de la communication, le nom, l'affiliation, l'adresse postale, l'adresse électronique, le numéro de téléphone et le fax de l'auteur principal. Les soumissions devront être anonymes et ne devront donc comporter aucun nom d'auteur ni auto-citation. L'un des formats suivants devra IMPÉRATIVEMENT être employé: - source LaTeX auto-suffisant (les styles non standards ou différents de ceux fournis pour TALN 2000 devront être inclus dans le fichier source) ET PostScript - RTF (Word) ET PostScript ou PDF Les versions PostScript devront être au format A4 et non Lettre US. INFORMATIONS PRATIQUES Les informations pratiques seront précisées ultérieurement, notamment sur le site web de la conférence (http://liawww.epfl.ch/taln2000/) ---------------------------------------------------------------------- COMITÉ D'ORGANISATION/ORGANIZING COMMITTEE Eric Wehrli (Président/President) Martin Rajman Cristian Ciressan Jean-Cédric Chappelier Marie Decrauzat Paola Merlo Christopher Laenzlinger COMITÉ DE PROGRAMME/PROGRAM COMMITTEE Pascal Amsili, TALaNa (Paris) Susan Armstrong, ISSCO (Genève) Nicholas Asher, University of Texas (Austin) Afzal Ballim, EPFL (Lausanne) Philippe Blache, LPL (Aix-en-Provence) Christian Boitet, CLIPS-GETA (Grenoble) Pierrette Bouillon, ISSCO (Genève) Didier Bourigault (CNRS, Paris) Jean-Pierre Chanod, XEROX Research Center (Grenoble) Cédric Chappelier, EPFL (Lausanne) Béatrice Daille, IRIN (Nantes) Dominique Estival, University of Melbourne Claire Gardent, Universität des Saarlandes (Sarrbrücken) Damien Genthial, CLIPS-IMAG (Grenoble) Gregory Grefenstette (XEROX) Michael Hess, Uni Zurich Pierre Isabelle, XEROX Research Center (Meylan) Daniel Kayser, LIPN (Paris) Geert-Jan Kruijff, Univerzita Karlova (Praha) Eric Laporte, CERIL, Université de Marne la Vallée Paola Merlo, LATL (Genève) Piet Mertens, CCL K.U. Leuven Jacques Moeschler, LATL (Genève) Cécile Paris, CSIRO (Sidney) Jean-Marie Pierrel, LORIA (Nancy) Alain Polguère, Université de Montréal Martin Rajman, EPFL (Lausanne) Owen Rambow, ATT Labs-Research Gérard Sabah, LIMSI (Paris) Jacques Savoy, Uni Neuchatel Jacques Vergne, GREYC (Caen) Jean Véronis, LPL (Aix-en-Provence) Eric Wehrli, LATL (Genève) Francois Yvon, ENST (Paris) Brigitte Zellner Keller (UNIL, Lausanne) Pierre Zweigenbaum, DIAM (Paris) ********************************************************************** * Contact: * * * * Eric Wehrli - TALN 2000 * * Dépt. de linguistique - LATL * * Université de Genève * * 2, rue de Candolle * * CH-1211 Genève 4 * * Switzerland * * * * Tel: +41-22-705.73.63 * * Fax: +41-22-705.79.31 * * email: taln2000 at latl.unige.ch * * * * http://liawww.epfl.ch/taln2000/ * ********************************************************************** -- Pour le comité d'organisation de TALN 2000, For the organising committee of TALN 2000, J.-C. Chappelier ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:54:03 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:54:03 +0200 Subject: Appel: RECITAL 2000 Message-ID: From: Damien Genthial ---------------------------------------------------------------------- Dernier Appel à communications NOUVELLE DATE LIMITE DE SOUMISSION : 5 mai 2000 RÉCITAL-2000 Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues École Polytechnique Fédérale de Lausanne (Suisse) du 16 au 18 octobre 2000 http://www-clips.imag.fr/RECITAL-2000 RECITAL-2000 at imag.fr conjointement à TALN-2000 http://liawww.epfl.ch/taln2000 La quatrième édition du colloque RECITAL se tiendra conjointement à la conférence TALN-2000 (Traitement Automatique des Langues Naturelles), à Lausanne (Suisse) du 16 au 18 octobre 2000. Le colloque RECITAL-2000 donne aux jeunes chercheurs l'occasion de se rencontrer, de présenter leurs travaux et de comparer leurs approches. L'expérience très réussie du couplage RECITAL-TALN en 1999 sera renouvelée cette année encore et devrait l'être systématiquement à l'avenir.Ce couplage permet aux participants d'acquérir une vision plus globale sur les avancées théoriques les plus récentes, ainsi que sur des applications industrielles actuellement développées. Il favorise également les échanges avec les chercheurs confirmés qui participent à TALN. * Thèmes indicatifs Problématiques classiques : - Analyse et compréhension de textes - Génération d'énoncés en LN - Traduction automatique - Production de résumés - Correction automatique - Dialogue humain/machine Problématiques connexes : - Ressources linguistiques (lexiques, dictionnaires électroniques, corpus etc.) - Sémantique lexicale (polysémie, catégorisation, etc.) - Sémantique du temps et de l'espace (représentation et raisonnement) - Logique (argumentation, modélisation des intentions et des croyances, etc.) - Architectures dédiées au TAL (systèmes multi-agents, réseaux neuromimétiques) - Acquisition et apprentissage automatique de ressources ou de connaissances (à partir de corpus, ou de l'interaction humain/machine) - Relations entre TAL et reconnaissance de la parole Cette liste n'est pas exhaustive et l'adéquation d'une proposition de communication à la conférence sera jugée par le comité de programme. * Soumissions Les soumissions (de 6 pages maximum en Times 12) devront être rédigées en français ou en anglais par de jeunes chercheurs (thèse en cours ou bien soutenue après septembre 1999) et accompagnées d'un résumé de 200 mots. RECITAL-2000 est la session pour étudiants de TALN-2000 et il est possible pour un auteur de soumettre à plusieurs manifestations (TAL, RECITAL, ateliers). Dans ce cas, la double soumission doit apparaître clairement dans le courrier adressé aux deux comités de programme. Les soumissions à RECITAL-2000 ne seront pas anonymes. Les relecteurs seront encouragés à signer leurs rapports. Les soumissions seront exclusivement électroniques et devront être envoyées en fichier attaché à : RECITAL-2000 at imag.fr. Les auteurs devront envoyer leur soumission en PostScript, en RTF ou en PDF. Une feuille de style LaTeX et un modèle Word sont disponibles sur les pages web de TALN-2000 (voir ci-dessus). Les articles retenus seront présentés lors des sessions RECITAL sous la forme d'une communication orale d'une vingtaine de minutes et figureront dans les actes de TALN-2000. * Dates importantes (calendrier identique à celui de TALN-2000) Date limite de soumission : 5 mai 2000 Notification aux auteurs : 23 juin 2000 Version finale (prêt-à-clicher): 4 août 2000 Conférence : 16-18 octobre 2000 * Comité de programme Responsable : Damien Genthial (mailto:Damien.Genthial at imag.fr) Pascal Amsili (TALANA, Paris) Pierre Beust (GREYC, Caen) Jean Caelen (CLIPS, Grenoble) Paul Deléglise (LIUM, Le Mans) Cécile Fabre (ERSS, Toulouse) Bertrand Gaiffe (LORIA, Nancy) Emmanuel Giguet (GREYC, Caen) Stéphane Ferrari (GREYC, Caen) Brigitte Grau (LIMSI, Orsay) Maurice Gross (LADL, Paris) Jean-Luc Husson (LORIA, Nancy Eric Laporte (Marne la Vallée) Jérome Lehuen (LIUM, Le Mans) Gérard Ligozat (LIMSI, Orsay) Daniel Luzzati (LIUM, Le Mans) Denis Maurel (LI, Tours) Reza Mir-Samii (LIUM, Le Mans) Jacques Moeschler (Genève) Philippe Muller (IRIT, Toulouse) Anne Nicolle (GREYC, Caen) Didier Pernel (L&H, Belgique) Cécile Fabre (ERSS, Toulouse) Violaine Prince (Paris 8) Martin Rajman (EPFL, Lausanne) Laurent Romary (LORIA, Nancy) Azim Roussanaly (LORIA, Nancy) Gérard Sabah (LIMSI, Orsay) Patrick Saint-Dizier (IRIT, Toulouse) Jean Senellart (LADL, Paris) Jacques Siroux (IRISA LLI/CORDIAL, Rennes) Max Silberztein (LADL, Paris) Jacques Vergne (GREYC, Caen) Jean Véronis (LPL, Aix) Anne Vilnat (LIMSI, Orsay) Michael Zock (LIMSI, Orsay) * Comité d'organisation Responsable : José Rouillard (CLIPS-IMAG Grenoble) Pierre Beust (GREYC Caen) Peggy Cadel (LILLA Nice) Jean Caelen (CLIPS-IMAG Grenoble) Damien Genthial (CLIPS-IMAG Grenoble) Stéphanie Pouchot (CRISTAL-GRESEC Grenoble) Dominique Vaufreydaz (CLIPS-IMAG Grenoble) Pour tous renseignements : Page web de RÉCITAL-2000 : http://www-clips.imag.fr/RECITAL-2000 Adresse électronique de RÉCITAL-2000 : RECITAL-2000 at imag.fr Adresse courrier: Colloque RÉCITAL-2000 à l'attention de Damien Genthial CLIPS, IMAG-Campus BP 53 38040 Grenoble cedex * Information Pratiques Pour toute information pratique concernant l'hébergement, l'accès au site à Lausanne, etc, nous vous renvoyons aux pages web de TALN (http://liawww.epfl.ch/taln2000). ---------------------------------------------------------------------------- --- TRILAN/CLIPS/IMAG, (lundis et jeudis) | IUT de Valence (les autres jours) Tél : 04 76 51 49 51 | Tél : 04 75 41 88 00 ___________________________________________________________________ Message diffusé par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:12 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:12 +0200 Subject: Appel: CULT 2000 Message-ID: From: "cult2k" CULT 2000 CONFERENCE ANNOUNCEMENT AND CALL FOR PAPERS=20 (with apologies if you receive multiple copies) Second International Conference on=20 CORPUS USE AND LEARNING TO TRANSLATE Bertinoro, Italy Friday 3 November and Saturday 4 November 2000 AIMS AND TOPICS CULT 2000 follows up the 2-day International Workshop organized by the = School for Interpreters and Translators of Bologna University in Forl=EC = in November 1997 (http://www.sslmit.unibo.it/cult.htm). The aim of the = Conference is to bring together practitioners and theorists sharing an = interest in the design and use of corpora in translation-related areas, = with special reference to translator and interpreter training. = Contributions in the form of papers, demonstrations and posters are = sought on the following topics: - Translation/Interpreting-specific issues relating to the design, = development and use(s) of corpora=20 - Integrating corpus work into courses for translators/interpreters - Corpus-based language learning/teaching for translators/interpreters - Implications of corpus use with respect to theories of = translation/interpreting - The respective roles of corpora, conventional reference tools, and = other computational translation aids - The World Wide Web as a resource for translation/interpreting=20 - Corpora and terminology - Corpus-based descriptive translation studies in the classroom KEYNOTE SPEAKERS Kirsten Malmkj=E6r (Middlesex University) Tony McEnery (Lancaster University) VENUE Bertinoro is a beautiful little town on the Romagna hills, renowned for = its warm hospitality and its good wine. The University Conference Centre = is set in a recently renovated medieval fortress dominating the town. = The view stretches from the mountains of Tuscany to the Adriatic sea. = You can have a look for yourself at: = http://www.spbo.unibo.it/bertinoro/eindice.html SCIENTIFIC COMMITTEE Guy Aston (University of Bologna) Mona Baker (UMIST - Manchester) Lynne Bowker (University of Ottawa) Jennifer Pearson (Dublin City University) Stig Johansson (University of Oslo) Krista Varantola (University of Tampere) ORGANIZING COMMITTEE Silvia Bernardini=20 Dominic Stewart=20 Federico Zanettin=20 ADDRESS FOR CORRESPONDENCE SSLMIT - CULT 2000 Corso della Repubblica 136 47100 Forli Italy Tel.: +39 0543 450 307/304 Fax: +39 0543 450 306 e-mail: cult2k at sslmit.unibo.it WWW: http://www.sslmit.unibo.it/cult2k/ PROPOSALS Proposals for contributions relating to any of the topics listed above = should reach the organizing committee not later than June 15 complete = with abstracts of about 500 words. All proposals will be reviewed. For = further details about the submission procedure please refer to the = Conference Web page or contact the organizing committee at the addresses = above. ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:24 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:24 +0200 Subject: Conf: NLP-2000 Message-ID: From: "Aristides Vagelatos" ANNOUNCEMENT 2nd International Conference on Natural Language Processing NLP 2000: Filling the gap between theory and practice URL: http://www.cti.gr/nlp2000 2, 3 & 4 June 2000, Conference and Cultural Center University of Patras - Greece ORGANIZED BY: - Computer Technology Institute of Patras - University of Patras Computer Engineering Department (Database Laboratory), Philology Department (Linguistics Section) - University of Athens Informatics Department - University of the Aegean Information & Communication Systems Department OBJECTIVES We feel that this is the most opportune time for a critical view of the achievements both in theory and in practice, and for developing bridges in order to build emerging advanced systems and services that will provide the breadth of information envisaged. The aim is to fill the gap between theory and practice so that developments and needs in theory to take advantage and give insights for new developments in technological methods and applications, and visa-versa. The goal is to bring together people that will attest to the progress of the field and disseminate it to a wider audience. Conference Secretariat: Mrs. Penelope Kontodimou P.O. Box 1421 University of Patras GR - 26 500 Patras - Greece Email: pinelop at cti.gr Tel: (+3061) 960.383 Fax: (+3061) 997.783 Program Committee Chair: Christodoulakis Dimitris (University of Patras), Greece, E-mail: dxri at cti.gr ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:26 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:26 +0200 Subject: Ecole: CLaRK'2000 Message-ID: From: Frank Richter Second announcement: Summer School 2000 in Bulgaria - CLaRK'2000 The Tuebingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK) is inviting applications to a summer school in Sozopol, Bulgaria, this summer. Please note the slight change of dates since the first announcement. *NEW* Dates: August 25th - September 8th 2000 (days of arrival and departure) Place: Resort town of Sozopol (Black Sea), Bulgaria Language: English Participants: Participants should be doctoral students who research the interfaces between computer science, cognitive science, linguistics, mathematics and philosophy. In exceptional cases, postdoctoral researchers as well as outstanding students in the final year of masters level studies who intend to pursue a doctorate will also be considered. The summer school is limited to 25 students. Places are competitively allocated on the basis of the research interests of the participants and the perceived benefits to those interests of attending the summer school. Participants must be proficient in English. Stipends: Via the CLaRK Program, the Volkswagen-Foundation will provide stipends for up to 6 students from the countries of Central and Eastern Europe and 6 further students from Bulgaria. The stipends will be awarded on a competitive basis. The stipends will comprise travel costs (up to DEM 600), and room and board for the duration of the summer school. At the discretion of the CLaRK Program, the stipends may include additional support for travel costs above DEM 600. Costs: Participants who are not sponsored by a CLaRK stipend should anticipate approximately DEM 125 for room and board per day. Costs for transportation to and from the summer school are not included in this estimate. Applications: Applications with a completely filled in registration form (available from www.uni-tuebingen.de/IZ/application.rtf), a curriculum vitae, and a short (maximum three pages) summary of relevant past and present research and education must be submitted to the Office of the International Centre at Tuebingen by 30th April 2000. Applicants should indicate whether they are applying for a CLaRK stipend. The event number that the registration form asks for is 5. CLaRK stipend applications must include a letter of recommendation with their application. Internationales Zentrum fuer Wissenschaftliche Zusammenarbeit Universitaet Tuebingen Keplerstr. 17 D - 72074 Tuebingen Tel.: (0049) 7071 / 29 - 77352 or /29 - 74156 Fax: (0049) 7071 / 29 5989 e-mail: iz at uni-tuebingen.de WWW: www.uni-tuebingen.de/IZ/starte.html Content and Goals Computational linguistics and knowledge representation are two distinct disciplines that share a common concern with what knowledge is, how it is used, and how it is acquired. However, though knowledge representation and computational linguistics clearly address broadly similar research problems, research within each of these fields has hitherto been largely ignorant of research in the other. Moreover, the ignorance the two fields have of each other both fosters and is fostered by a wide gulf between the educations received by students of knowledge representation and students of computational linguistics. The goal of the summer school is to help bridge this gulf by introducing the summer school students to recent developments in the interdisciplinary field of computational linguistics and knowledge representation. The summer school will take the form of courses in various topics. The program provisionally includes courses in computational morphology, corpus linguistics, declarative knowledge representation, natural language semantics, Slavic syntax and psycholinguistics. Preliminary Course Program Erhard Hinrichs, Sandra Kuebler: Computational Tools for Corpus Linguistics Valia Kordoni/Frank Richter: A Comparison of LFG and HPSG Anna Kupsc: Slavic in HPSG Detmar Meurers: Introduction to HPSG Janina Rado: Introduction to Psycholinguistics Kiril Simov/Gergana Popova: Computational Morphology Kiril Simov/Atanas Kiryakov: Declarative Knowledge Representation Kiril Simov/Atanas Kiryakov: WordNets: Principles and Applications A short description of the courses can be found on the CLaRK webpages, http://www.sfs.nphil.uni-tuebingen.de/clark/ The expected guest speakers include Nicola Guarino from the University of Padova, Italy (www.ladseb.pd.cnr.it/infor/people/Guarino.html). Contact for further information: Kiril Ivanov Simov (Sofia): kivs at bgcict.acad.bg Frank Richter (Tuebingen): fr at sfs.nphil.uni-tuebingen.de WWW: http://www.sfs.nphil.uni-tuebingen.de/clark/ ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:28 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:28 +0200 Subject: Projet: Ontology Message-ID: From: Patrick Cassidy April 22, 2000 The following note contains a follow-up to some discussions held at the meeting of the Association for Computational Linguistics (ACL) last year, and is now being brought to the attention of a wider group. This is being sent to a number of different listservers, as well as the membership of the ACL and I apologize for what will inevitably be some duplication. Please send all comments directly to me. Best regards, Pat ============================================= Patrick Cassidy MICRA, Inc. || (908) 561-3416 735 Belvidere Ave. || (908) 668-5252 (if no answer) Plainfield, NJ 07062-2054 || (908) 668-5904 (fax) internet: cassidy at micra.com ============================================= To: Members of the Association for Computational Linguistics and others with an interest in knowledge representation, lexicons, and lexical semantics From: Patrick Cassidy (cassidy at micra.com) Subject: A Request to Participate in a Study of the Utility of a Standard Ontology and Lexicon for Natural Language Understanding (NLU) and database interoperability ============== Background ============== In recent years there has been a great deal of effort in building lexicons, ontologies, and terminologies, both for the purposes of basic research and for practical applications. The advantages of common formats and common content to allow reuse of results between groups has been widely recognized, but the practical funding situation has required in most cases that individual groups focus on relatively narrow aspects of the general problem. Efforts have also been underway for years within and between a number of groups to develop common resources to promote interchange of data and to compare results, and to reference and organize the results of the many groups who have prepared valuable resources. These very valuable projects have helped mitigate the difficulty of preparing and finding useful ontologies and lexical resources. However, there is still little prospect that these multiple projects will lead in the near future to a unified common ontology and lexicon that has sufficient detail and functionality to be adopted by a large number of groups as a reference standard, and which can be used directly without substantial modification for a variety of purposes in research and practical applications. Of special value would be the development of a common defining vocabulary of concepts and associated words and relations that would be sufficient to define all of the specialized concepts and words used in applications. The ability to use a common vocabulary to define the concepts and words in diverse applications will provide a level of interoperability unavailable by any other means, except for one-by-one coordination between projects. The question arises whether it is now possible to build on the large body of existing data and experience, to construct such a reference standard within a tightly coordinated single project. The goal will be to create a database that is as inclusive as possible of all of the results and intuitions resulting from previous research and development efforts, and to include as many as possible of the current practitioners within the project to build this resource. The main problem is that development of a basic but realistically large ontology and lexicon for Computational Linguistics research will require a project to coordinate a group -- probably a consortium of dispersed academic and industrial participants -- of a size that will require substantial funding. Though large by the standards of most NLP research projects, such a coordinated effort would still be modest by comparison with funding for important research tools in other areas of science, such as space probes, particle accelerators, or telescopes. Skepticism about the possibility of congressional funding for such a project is understandable, but there is ample precedent for obtaining special congressional funding of tools for research. What is needed is to show that the costs will be repaid by the usefulness of this database both for research and for construction of advanced applications. At a minimum there should be a survey to identify the potential users of a standard ontology and lexicon. In the eventuality that special congressional funding could not be obtained, this will still be useful to help move toward building common resources by other means. At the annual meeting of the ACL in Maryland in June 1999 I helped organize a "birds-of-a-feather" meeting to discuss whether there is at present a need and an opportunity to build a large but basic ontology and lexicon for use in NLU research and applications. Among the 23 that participated in the discussion, most had expended some effort building lexicons and ontologies for natural language understanding, but some members were present who had not themselves participated directly in such efforts. We spent over an hour discussing mostly the technical question of what kind of ontology could be useful for natural language understanding, and the political questions of whether it would be practical to attempt to get agreement at this time among ontology developers with different views of how to proceed. The view was almost unanimous that such a project should be attempted, though it was recognized as technically and organizationally complex. There was also a large degree of skepticism as to whether we could convince congress to fund such a large project. We had hoped to be able to have a wider discussion among the general membership of the ACL, but as it turned out the general business meeting ran well over its allotted time, and when I raised the issue there was no time for discussion, so a motion was made and passed that I should form a committee to study the question and report back to a future meeting. This note is the first request for participation in such a committee. The question of construction of a reference ontology for Computational Linguistics and for database interoperability has already been discussed over several years within the ANSI T2 ad hoc committee on ontologies. That ad hoc committee is no longer actively meeting, and this note and its suggested formation of a study committee is in part an attempt to fill the void left by discontinuation of those discussions. One of the conclusions of those discussions was that substantially increased funding would be needed for a coordinated effort, in order to move the development of useful ontologies beyond the current stage in which isolated groups each pursues its own ideas, which are generally incompatible with or very difficult to merge with those of other groups. The present note is intended to bring the issues addressed by the T2 committee to a wider group, and to form a committee that can develop objective information that would provide justification for the substantial funding needed for a unified project. As mentioned, the complexity and size of such a project, which would require a tightly coordinated effort with funding substantially larger than a typical CL research project, makes it likely that special funding would have to be obtained directly from congress. To obtain such funding it will be necessary to show that there is a significant group of established researchers who have been active in building lexicons and ontologies, and who believe that building a standard reference is technically feasible at present, and that such a reference would be used widely enough to justify the expense. One can find expressions of such a belief in private conversations and in published papers, as well as in the existence of research efforts to build common lexical and ontological resources. To begin the process of developing a well-organized proposal that can be considered seriously by congress, what is needed is a more formal study to present the findings of a broadly representative group rather than of an individual or single research group. This request for participation in this study is only a first step in developing such a proposal. The specific purposes for organizing this committee and the subjects for discussion are: (1) to determine the general characteristics of an ontology and lexicon that would incorporate as much as possible of the results and insights of those who have already spent many years doing research on lexicons, ontologies, knowledge representation, terminologies, and lexical semantics, and would be broadly useful for both research and applications; and (2) to estimate where and to what extent such a database, if built, would in fact be used. Quantitative data about potential areas of use would be especially valuable, to demonstrate that construction of such a database would be worth the cost. The structure of this committee is open to discussion. I would suggest that anyone with experience in any of the relevant fields should be able to vote on any proposals for which a measurement of opinion is needed, and those individuals wishing to participate as voting members should inform me of that before the end of May. Discussions will be conducted by e-mail (I will forward comments to a list of interested persons), unless someone is willing to set up a listserver for this purpose (perhaps an existing listserver should be used?). Individuals willing to prepare a report of the potential uses of a defining ontology/lexicon in specific areas of research or in applications would receive and summarize copies of any data or suggestions relevant to their area, sent from any interested person. The number of possible summaries is not limited, but will probably be small. Any individual is free to make any comments, and all comments received will be forwarded to anyone wishing to receive them, unless they are specifically intended not for distribution. I do not anticipate that at this stage any degree of agreement could be reached about any details of the structure of a common ontology or lexicon, but some summary could be prepared of the various alternatives that might be suggested. I hope that at the NAACL-2000 meeting in Seattle in the first week of May, some preliminary indication could be obtained about how many individuals would be willing to participate as voting members and/or report writers. I do not have a fixed timetable in mind, but probably three months will be sufficient time for interested parties to determine potential uses and send in comments. The timing of subsequent actions will depend on the wishes of the voting members of the committee. All persons interested in this project in any way should contact me by e-mail (cassidy at micra.com) or telephone (908-561-3416). Suggestions about how to organize an informal study of this type would also be welcome, but need to be sent soon to be useful. It will be worthwhile to include in this study a summary of all ontological and lexical resources currently available, and I hope that some representative of every group that has built any form of ontology, terminology, or other lexical resource, which is now available to the public or might become part of a common reference ontology/lexicon, would send me a brief summary of their projects and a reference to the location of any existing data available publicly. There are already several web sites on which pointers to the locations of such resources are listed, and the owners of those sites and those who have prepared other lists of available resources are encouraged to send a copy of the lists they have already prepared. The complete summary of references to such resources submitted will be published as part of the report of the committee. The data that are most needed to determine potential utility of a reference database will be estimates of how much such a common ontology or lexicon would be used. For this purpose, anyone who would be likely to even try using it should send a note indicating the type of system in which it would be used and how it would be used, and how much more efficiently the system might function. I would expect that anyone currently using an ontology or semantic network would want to try such an ontological lexicon, and if there are those who would not try it, the reasons for this skepticism will probably serve as useful input. One of the important questions to be answered is whether one can estimate potential utility in quantitative terms, and if so, how. The likelihood of the ontology being used in one's own system may be expressed in any way, but at least three levels can be distinguished: (1) those who would be willing to participate in construction of such an ontological lexicon; (2) those who would be likely to adopt a standard ontology or lexicon, if it existed; and (3) those who would try using a standard ontology or lexicon, to test its utility. Descriptions of potential commercial uses would be especially valuable for convincing congress that funding is justified. For example, estimates have been made that electronic commerce over the internet will amount to 425 billion dollars by 2001 (IEEE Intelligent Systems, Jan/Feb 1999 "Let's Go Shopping" by Michael McCandless, pp. 2-4). Labor costs in sales transactions tend to run about 10%, so the costs of executing those transactions would be about 40 billion dollars. If these costs could be reduced by 1% due to efficiencies generated by the use of a standard knowledge representation scheme, those cost savings would amount to 400 million dollars per year. The total cost of the development of such an ontology would then be paid back in less than 6 months. One can make similar estimates for other activities which use advanced computer programs, and find similar likely savings. Thus even a miniscule improvement in the efficiency of computer programming or the use of computer programs would appear to make this project cost-effective. However, estimates of this type will be far more convincing if there are those involved in the development or use of programs which have or should have semantic elements, and who could provide more accurate and objectively-based estimates for specific examples. In the best case, an industrial group who maintains a database that already uses an ontology to enhance its functionality might estimate, for example, that an ontology of the type described would likely improve the efficiency of the program by, say, 5%. This number, multiplied by annual sales of the program, could provide a crude estimate of economic benefit. There are several obvious difficulties in making such estimates, starting with the fact that we don't know what the final database will look like. But even very crude estimates from people familiar with a potential use will be better than wild guesses from those with little familiarity. Groups which have already built an ontology or a semantic lexicon can review the costs of development of their own system and determine, if a common ontology would be useful, the direct cost savings that would occur in adopting a standard ontology rather than constructing an enhanced version of their own system. Even without an economic justification of that type, building this database should be justifiable even if it is used primarily as a research tool. Accordingly, I hope that we can obtain comments from all individuals who would be likely to use such a tool in their research or in building applications, as well as those who wish to comment on the desirable structure of such a database. I plan to organize a birds-of-a-feather meeting at the upcoming NAACL-2000 conference in Seattle (April 29-May 3) where those who are willing to consider serving on this committee can meet, and discuss questions of form and substance of a study such as this, as well as any comments that have been received at that point. Accordingly, responses should be sent to me by e-mail if possible before the 27th of April, or they can be presented and discussed at the meeting in Seattle. This study will continue for at least three months, so comments will be welcome and are likely to be valuable after the meeting as well. In the discussions I had concerning this topic with other attendees at the 1999 ACL meeting, the first question was of course what type of ontology is being proposed. The general structure as well as detailed technical questions can only be resolved in the course of preliminary discussions among those who will participate in the construction of the database, as well as in the construction phase. But for the sake of discussion, I have described below some characteristics that will likely need to be included in such a database. The final form of the ontology, if it is to be useful for Computational Linguistics, will have to include substantial lexical knowledge, or will have to be tightly integrated with lexicons built separately. Rather than call it an "ontology" it might better be referred to as an "ontological lexicon," although there should be a core conceptual component in the ontology which will be language- neutral. One of the purposes of formation of this committee is to obtain a wider range of comments concerning desiderata for the structure of such a database. In addition to questions about how such an ontological lexicon would be structured, many at the ACL meeting had other questions. I have reproduced below most of the questions that were asked, and indicated some potential answers. It may well be that nothing suggested here will ultimately find itself accepted unchanged in the final result of construction of this database, but the important issue is that construction of some such a database will be essential to provide a common tool that will permit more effective widespread collaboration in research toward human-level understanding and generation of language. ======================================== What Kind of Ontology is Being Proposed? ======================================== What is being discussed here is the need for a database having two main components: (1) an upper ontology of fundamental concepts, represented in logical format, which are sufficient to serve as the building blocks for construction of all of the more complex concepts that are used in any given field; and (2) a basic lexicon of defining words, in which the word meanings are represented using the same set of fundamental concepts, and which are sufficient to define all of the words of the language. Each word in the lexicon will also have an associated definition using the defining vocabulary, which will in some cases look like an ordinary dictionary definition. Over time, both the ontology and lexicon can be expanded to include more specialized or less common concepts, but the main goal for the initial phase should be to specify the minimum set of defining concepts, semantic relations, and axioms for the ontology, and the minimum set of defining words for the associated lexicon. This description evades some controversial issues regarding what constitutes "words" and "definitions". It is understood that many polysemous words have vague or plastic meanings, dependent on context, and for such words an exhaustive list of meanings cannot be specified; and many words cannot be defined by necessary and sufficient conditions. What can only be recorded in a database of this kind are the necessary characteristics of word meanings, and perhaps some markers indicating when variations in meaning can be expected in linguistic usage. This will be an attempt to record as much as can be agreed on about basic words and concepts at the present state of the field. Applications that need to handle ill-defined words will need additional structure beyond what can be included in a standardized lexicon. The conceptual component of this database would be equivalent to an "upper ontology" or "top ontology" (although this term is used by different people to indicate ontologies of somewhat different sizes). Specifying the meanings of words using a basic ontology of this type constitutes in effect a theory of the meanings of the words. A realistic lexicon will need to include not only single words, but fixed collocations and probably also word combinations that are not normally considered idioms but have some non-compositional character. The lexicon can include not only the word meanings in logical format, but any other data associated with word meaning or usage which is useful for applications. For example, in addition to part-of-speech or etymological data, the lexicon could include verb case frames which would be duplicative to some extent of data in the verb definitions, but in a different format, perhaps easier to use for some purposes. Statistical data on word associations would be another useful component. Though not essential, it could be easily included when available. Specifics of what will be included and how the data will be structured can only be decided by those participating in the construction of the database; the remaining comments in this section are personal suggestions, which may not be adopted by the project participants. The conceptual elements in the ontology will be defined in a logical format, but there are two principles which could make the database more widely acceptable and easier to use: (1) concepts which are not lexicalized in any language as single words or fixed collocations can be included in the ontology, but should be used only where there is some cogent need; and all concepts in the ontology will have an associated definition in some language (usually English). (2) Ideally there will be a "definition parser" that can take such a defining string and produce the logical structure that it is intended to define. The emphasis in this project is on the most general words and concepts, so that a common defining vocabulary of concepts can be developed which, if used for defining terms in specific applications, will allow some significant level of conceptual communication between applications developed by independent groups. Applications that process complex information but are not required to understand linguistic phrases, such as database applications or electronic commerce, can use the ontology, and in theory could ignore the lexicon. Linguistic applications would use the lexicon, and, if any level of conceptual understanding is required, would also use the word definitions in logical format, which will usually also require the use of the basic ontology. (In some cases a linguistic application may use the lexicon and associated definitions with minimal reasoning, and the lexicon would function in such cases as a thesaurus or simple semantic network, such as WordNet). Different ontologies have already been developed by a number of different groups for various purposes, but in general their structures are so different that transferring information from one system to another is very time-consuming or error-prone. The difference between this ontological theory and others which have been proposed thus far lies mostly in the size of the database and the extent to which it will both include and represent a consensus of the different theories (i.e., ontologies and lexical semantic representations) that have been developed thus far by independent groups. What would be very useful for both research and applications development is to have at least one well-developed defining vocabulary freely available to all potential users, constructed by representatives of most or all of the existing ontology and lexicon groups and containing as much as possible of the compatible information which each of these groups could contribute to a common effort. In addition to the core database, user interfaces and applications programming interfaces should be developed, as an integral part of the project, to make the database as easy as possible to learn and use. The representations of the concepts, and through them the meanings of words, will need to be specified ultimately at a logical level that will allow automatic reasoning. The existing Knowledge Interchange Format (KIF) and Conceptual Graphs (CG) standards could serve as well-defined theory-neutral formats for storing the meaning representations. To be useful for computational linguistics, a considerable amount of lexical information should also be included. This distinguishes the proposed database from that of CYC, which placed primary emphasis on utility in reasoning. Another important distinction is that the database must be public domain or at least freely and easily available over the internet for research, such as is the WordNet system. Without the free availability to any potential research or applications group, developing the necessary agreements between groups may be impossible, and most of the utility will be lost. The ontology that will emerge from such a project will most likely have some variant of the typical structure of a set of entities connected by relations, since this is the basic model of meaning representation which has been universally adopted, though with some significant differences between implementations. The relationships may be thought of as semantic relations or as axioms of the ontology, but it is understood that to be useful for reasoning the semantic relations must be defined with sufficient precision that the logical implications of one entity having a specific relation to another can be calculated unambiguously. Although in many ontologies the hierarchy has receive the most attention, it is equally important that the semantic relations be fully agreed upon and well-defined. The set of basic concepts and semantic relations needed will be those which are necessary and sufficient to provide logical definitions of any of the concepts, and by extension, words, which will be used in applications. In effect, what is needed is to create a dictionary with definitions of the words, and a parallel ontology with the same definitions expressed in a logical format suitable for automatic reasoning. The lexicon that labels the concepts of the ontology should include all of the basic words that are needed to define all of the other words of the language; the "words" of the language must eventually include all collocations which are to any degree non-compositional, that is, whose meanings cannot be deduced as a predictable combination of the meanings of the individual component lexical strings. The lexicon cannot at the initial stage be comprehensive, but it should also contain those common collocations, such as those which are produced by the lexical functions of Mel'cuk, which are either essential for generation of fluent colloquial language, or so commonly used that their inclusion will improve the speed or accuracy of the language understanding process. As a practical matter, to demonstrate the potential uses of such an ontological lexicon and to facilitate development of a user interface that will permit widespread use, there should be a detailed implementation of this basic defining vocabulary to define specialized concepts in at least two different areas. Two that come to mind are, for example, the medical area, where the basic defining vocabulary could be integrated with the UMLS system and its metathesaurus; and the military area, where significant effort has already been expended to apply the CYC ontology. These two are by no coincidence areas of interest to governmental agencies. Integration with other specialized ontologies or lexicons might be proposed and performed by individual groups as part of the project. Enterprise models, manufacturing, electronic commerce or planning ontologies would be additional candidates. The primary motivation for developing a common theory of meaning is to allow a greater degree of re-use of research results in computational linguistics, as well as more direct communication between different implemented systems which have a linguistic or conceptual component. ============================================ Why do we need a common defining vocabulary? ============================================ Any difference between two systems in the internal representation of words or concepts must inevitably lead to some difference in the inferences that the two systems make from the same data. Thus without some common basis for defining the meanings of the different concepts used in different systems, the transfer of knowledge between systems will be impossible, time-consuming, or highly error-prone. The need for a common vocabulary of defining concepts is felt not only in the field of natural language understanding, where communication is the primary goal, but also in other fields of Artificial Intelligence, wherever conceptual information painstakingly entered into one system could be useful in another system. It is clear that in some areas of research in Natural Language, semantic representation of word meanings is less important than in others. Research in speech-to-text conversion, for example, and in parsing methodologies, has progressed without the use of semantics. Statistical methods have also been shown to be useful for some practical purposes, though the extraction of the meanings of texts is beyond the capabilities of such a methodology by itself. It is also true that groups doing research with systems which will not interact at a conceptual level with other systems have a great degree of freedom in choosing representations of meaning which may be suitable for their purposes even if not usable in other systems. We would hope that groups whose research does not immediately require detailed semantic representation of meanings will nevertheless recognize its importance for the progress of research in language understanding, and not raise objections to this project unless the objections address the feasibility of the goal. The developers of an ontological lexicon will be those groups working specifically on methods to represent word meanings, but the need for a common representation of meanings of words and texts is felt directly also by those whose research involves some level of understanding, such as in information extraction, message understanding, word sense disambiguation, text categorization, machine translation, and database interoperability. The difficulties caused by a lack of common conceptual representations impact not only NLU and the database and expert systems that CYC has been applied to; it affects many areas of AI. In a recent issue of the IEEE Intelligent Systems (January/February 2000) several commentators discussed the state of AI and some of those comments reflect this problem indirectly: Nils Nilsson commented that "AI shows all the signs of being in what the late Thomas Kuhn called a pre-paradigmatic, pre-normal- science stage. It has many ardent investigators, arrayed in several camps, each claiming to have the essential approach to intelligence in machines.. . . It might be that intelligence is the kind of multiplex for which no single science or paradigm will ever emerge." Donald Michie stated: "The most notable nontrend [in AI] has resulted from consistent disregard of the closing section, Learning Machines, of Turing's 1950 paper. A two-stage approach is there proposed: 1. Construct a teachable machine. 2. Subject it to a course of education. Far from incorporating Turing's incremental principle, even the most intelligent of today's knowledge-acquisition systems forget almost everything they ever learned every time their AI masters turn to the next small corner of this large world." A common basis for representation of knowledge will help to overcome these problems, and help to move more toward the normal scientific paradigm, enabling more rapid advances by allowing investigators to investigate the same phenomenon and compare details of results more directly. In computational linguistics research, having at least one common detailed theory of word meanings for the defining vocabulary will provide a powerful tool for progress toward the ultimate goal of human-level language understanding. =============================================================== Wouldn't it be better to develop a common ontology cumulatively by contributions from existing research groups rather than try to build a larger unified project? =============================================================== The construction of an ontological lexicon for natural language understanding is different in several important ways from most areas of scientific research, where ideas and results from small independent groups provide the bulk of the individual contributions to evaluate or elaborate the theories of each field. The predominance of original contributions from small groups is true in most areas of natural language research as well, but for construction of a large ontology and lexicon for use as a tool in research, the usual research process less effective. The main problem is the size and complexity of a realistic ontology, and the intimate and multiple interrelations of its component parts. Specifying the meanings of the defining vocabulary is to build a fundamental ontology of concepts and then to construct a theory of the meanings of words using those concepts. This endeavor has more of the character of an engineering project than of a research project, in that it is the construction of an artifact which has many complex interacting parts. It may be in theory possible to achieve the same result eventually through small independent contributions of ideas and elements, but such a process is likely to be much slower than a coordinated project, and will be less likely to achieve the goal of a widely accepted reference sta`ndard within any foreseeable time frame. In addition, the time lost in pursuing the development of a common ontology through uncoordinated effort may well prove eventually much more expensive, through the lower efficiency both of research and of implemented programs developed in the interim, than would the development of the same database by a single adequately funded coordinated effort. Furthermore, the problems of coordination of groups with different approaches to ontology development, admittedly difficult even in a single properly funded project, might well be insurmountable without the impetus of deadlines for agreement on specific subproblems within an overall plan of development. One possible alternative is the elaboration of an existing ontology, such as the WordNet, by the cumulative addition of new functions or data. This will, one may hope, proceed in any case until a coordinated project is funded. But in order to accumulate into a unified system, there would still need to be a prime coordinator - in this case presumably the WordNet group. Their own views would then necessarily predominate, and since these have been driven by specific goals and objectives, which are different from the goals of other groups, the resulting database would not represent the best common approach to the varied problems, as would a project initiated de novo for the specific purpose of answering a wide range of research and practical goals. It is also difficult to imagine that the total cost of proceeding in that fashion would in the end be any less than a single coordinated project, which would also contain input from WordNet as well as from other existing systems. The worst-case scenario is one in which several commercial concerns develop proprietary versions of a natural-language ontology, of which the largest part is not publicly available. That is currently the case with the CYC project, and it appears to be the direction in which Microsoft's "MindNet" project is heading. If such a situation develops, there will not be one but several competing "standards", none of which will be easily available to researchers, and even if available to some degree, will not be able to be enhanced and redistributed by most of those who could improve such a system. Such systems will not serve the purpose of providing a common test bed in which new ideas for representing word meanings can be tried by many research groups in realistically large systems, with results distributed to the research community at large. Proprietary systems are also likely to be less reliable than a public one and their behavior unpredictable to anyone outside the development group. ================================================================= Would non-U.S. groups be eligible to participate in this project? ================================================================= Much important work on ontologies has been performed outside of the U.S., and I would expect that participation by non- U.S. groups would be welcomed, indeed would be essential if the resulting ontology, which should be language-neutral, is intended to serve as a standard throughout the scientific community. Since the emphasis would be on creating a defining vocabulary of general concepts sufficient to define all specialized concepts, the experience of those whose native language is other than English will be particularly valuable to recognize when useful basic concepts are lexicalized in one language and not in others. There are already several European projects which are aimed at the construction of common ontological and lexical resources, and it would be great loss if those groups did not participate in an inclusive effort. The language-specific elements of the lexicon will of necessity concentrate first on English, since creating a computational lexicon even of one language is already a very large task. Groups from the UK could of course work on the English lexicon. But if at all possible, groups with experience in automatic translation or other multilingual applications should be requested to participate, since some of the more subtle and difficult problems in knowledge representation may be highlighted by the difficulties found in accurate translation. It is difficult to predict to what extent the inclusion of lexicons for other languages will be feasible; groups which presently concentrate on translation will presumably want to include their parallel lexicons for languages other than English. Ideally, the European research funding agencies might fund European groups willing to coordinate their work with this project, who could concentrate on non-English languages. ================================================================ My notions of how to represent concepts changes every few weeks. How can we fix on a single representation at this time? Do we know enough at present to justify a major project? ================================================================ It goes without saying that an ontological lexicon, like the language it represents, will change over time, but a legitimate question is at what point it is appropriate to undertake a first effort to construct a standard tool that can be used and tested by the entire research community. There have not been any major fundamental changes in the prevailing entity-relationship paradigm for representing knowledge over the past ten years, and the paradigm has been sufficiently well investigated at a fundamental level that there seems to be no reason to delay trying to build a consensus ontological lexicon based on the best knowledge now available. This will provide a research tool that can help to discover the strengths and weaknesses of different aspects of this paradigm, and it can include all the elements deemed important by those who have been studying meaning representation for some time. The database can then be thoroughly and widely tested for conformity to the realities of language use, and for utility in reasoning about data. The main motive for this project is the observation, from prior experience, that the fundamental concepts of any language are so intimately connected with each other that no theory of the meaning of any of its component concepts can be tested in a realistic setting unless some consistent representation of the entire fundamental vocabulary is available. We therefore need some starting point with a realistically large database representing most of the fundamental concepts of a language, in order to make effective tests of whether any specific individual components conform to the way people actually use words and concepts. ================================================================ For how long will the ontology constructed be useful? Isn't it likely to change and need modification or replacement? ================================================================ Based on the lifetimes of existing ontologies, we can expect that a major effort at developing a standard ontology will result in a database that will be useful for research and practical purposes for at least ten years. To avoid getting outdated, the ontological lexicon will need a core group to provide continuing effort at maintenance, at a minimum level of effort possibly five times less intense than for the initial development. It is conceivable that eventually some fundamentally different structure for meaning representation will be proposed and widely accepted, in which case it would be difficult to predict how much of the structure of this proposed ontology would be reusable. But more likely the ontology will continue to be useful for decades by modification, replacement, or addition of new components, with most of the structure remaining stable for years. It is also unlikely that any new meaning representation paradigm could gain wide acceptance unless some substantial effort such as this provides a basis for thorough testing of the entity-relation model on a realistic scale. As a theory of the meaning of words, this database will doubtless be modified and elaborated, as are most scientific theories. Theories in general are tools for organizing research; they provide a framework in which to formulate tests to confirm or refute aspects of the theory. They are useful for a time to make collaborative research on a topic possible, after which they may be modified or abandoned. In a theory with as many individual parts as an upper ontology, we can assume that some parts will be found inadequate for some purposes, while others may remain unmodified for a long time. The core maintenance group, or perhaps a committee with broad representation, would be responsible for making and publicizing the changes in each new revision. Having this theory easily available to the entire research community will maximize the likelihood of finding and addressing inadequacies in its structure. ============================================================= Ontologies have not been shown to be notably useful for NLU. Why spend resources building a bigger one? ============================================================= There is apparently a widespread notion that ontologies, and specifically the CYC ontology, have been tested for utility in Natural Language Understanding and have not proved useful. It is important to address this perception. In fact, attempts to use CYC in natural language have been very modest in terms of time spent, and the main virtue of CYC, its logical structure, has scarcely been tested at all in NLU applications. It is also important to recall that CYC was not designed with use in NLU as a primary objective (as would the ontological lexicon suggested here), although Lenat had expected it would be useful for that purpose. CYC has two other important flaws which would not apply to an ontology built as suggested here -- (1) CYC was built by a single group with a specific viewpoint, and did not include input from many other practitioners of diverse schools of knowledge representation, ontology and lexical semantics. Regardless of its internal consistency, it cannot serve as a focus to bring together a large number of groups to use it as a common reference standard; and (2) most of CYC is not publicly available, and use of CYC presents difficult legal issues. Although it can be useful for specific industrial contractors, its lack of public availability make it unsuitable for use as a research tool; even when made available to academic groups, detailed results of research cannot be freely described, nor modified versions redistributed to other groups. The study that may most directly account for the perception of CYC's inadequacy was performed in 1996 by Nirenburg's group at NMSU ("An assessment of Cyc for Natural Language Processing", MCCS-96-302, available on the Web at: http://crl.nmsu.edu/Research/Pubs/MCCS/Abstracts/mccs-96-302.htm). This study of the utility of CYC for Natural Language research found that several desirable features were absent. It did not, however, suggest that the existing structure could not be used, rather that it needed additional components or structures to be more useful. It did not make any negative conclusions about ontologies generally, and indeed that study group has its own ontology which it finds more directly useful for its purposes. Perhaps of greater relevance is the widespread use of WordNet and EuroWordNet. Although this semantic network does not qualify as a logic-based upper ontology as would the basic ontology which would be constructed as suggested here, it does contain many conceptual relations which would probably be widely accepted as part of the larger ontological lexicon which could be constructed if adequate funding were available. The wide use of WordNet does provide strong evidence that when well-structured and easily usable resources are publicly available, they will prove to be valuable tools for research. This is scarcely surprising, as progress in many types of research is limited by the tools available. Since there has not yet been an ontology constructed with even close to the amount of detail that is needed for understanding of language, it is far too early to draw conclusions as to how Useful a fully-developed and publicly available ontology would be. One of the purposes of developing a comprehensive ontological lexicon would be to discover how useful the present ideas about knowledge representation really are, without the impediments of having multiple small and incompatible sets of data on word meanings. Smaller ontologies have in fact been shown to be useful to some extent in language-understanding tasks, such as disambiguation, but thus far those available have not been shown to dramatically improve performance. Nor should they necessarily. As mentioned, a comprehensive ontology does not by itself constitute a language- understanding system, there are many additional aspects of language understanding systems that must be developed as well. Although an ontology is not the only component of a language understanding system, or even the main one, and its usefulness depends directly on the systems in which it is used, some form of common ontology is a necessary prerequisite for sharing research results in language understanding, wherever the actual meanings of linguistic expressions need to be represented. Many specialized ontologies have been constructed which are not designed to be used in language understanding. But until a common representation of word meanings is used by more than one or two groups, advancement toward human-level understanding of language will be very difficult and is likely to be slow and inefficient. The proposed ontology will be one intended to be useful for NLU as well as for other purposes, such as database interoperability. It will therefore need to be connected intimately with the lexicon, and as much as possible of the type of detailed lexical information that is found in Melcuk's Explanatory-combinatorial dictionary will have to be included. As mentioned above, what is needed is better thought of as an ontological lexicon. ==================================================== Would there be any images or graphical information representation in the ontology? ===================================================== It may be true that some degree of imagery or graphical representation may be required to adequately represent certain concepts or word meanings. Whether it will be feasible to include such data in the first version of an ontological lexicon will have to be decided by those participating in the organization of the effort. It will be helpful if individuals who have worked on graphical information representation were to participate in this study. ============================================================== Different people use different internal ontologies, and to some extent different lexicons. How can we include all of those differences in a single consistent database? ============================================================== In order to serve as a completely accurate medium of communication between agents, the word senses of a language must be identical between speaker and listener, or some degree of miscommunication or ambiguity will result. It happens in human- to-human communication that use of words in different senses by different people causes errors in the communication process. It will also be true that in human-to-computer communication similar differences in internal representation will lead to some miscommunication, though this can be eliminated in computer-to- computer communication. Special procedures for recognizing when variants of meaning are being used will probably have to be part of the implementing systems, and may not be includable in the ontological lexicon itself. Words that are commonly used in variant senses, or have productive polysemous meanings, can be marked as such, and the broadest senses can be included, even though the procedures for recognizing variants of meaning may not be contained within the lexicon. These are the cases where recording collocational use may be especially helpful to disambiguate the sense. It is necessary to build at first a basic lexicon and ontology of words which identifies the most common senses that are used by almost all native speakers of a language, and from that subsequently to build up and include less common or idiosyncratic variants, wherever such variants have some significant level of usage. The differences in their internal lexical representation that people have, if they are sufficiently widespread, may have to be treated similarly to multiple discrete senses of words, or the semantic plasticity of polysemous words. In the real world, of course widely variant use of language can be observed; any idiot or psychotic individual may produce a string of seemingly linguistic utterances that are completely uninterpretable by any other person, however skilled in the language used. The project is intended to produce only a basic reference vocabulary, and the recording of highly individualistic, poetic, and idiosyncratic usage of words will be beyond its scope. Most specialized uses will have to be dealt with by specialized systems built to handle such variation in usage. It is the common defining vocabulary which would be the main concern, though the inclusion of some standardized or common uses of specialized technical words will be valuable, limited only by the time and resources available for extension of the database core. ================================================================= Will funding for construction of such an ontology reduce funding for other areas of Computational Linguistics? ================================================================= In any recommendation made to congress for funding of this project, it must be strongly emphasized that the creation of a standard ontology/lexicon will not substitute for other aspects of computational linguistic research, but is only a tool for such research. The reduction of funding for other aspects of CL research would be counter to the purpose of building the ontology, and would squander the resource that would be built at significant expense. Those who contact funding agencies or members of congress to recommend this project need to be sure to emphasize this point. ====================================================================== Will recommendations by an ACL committee for congressional funding constitute lobbying and jeopardize the tax-exempt status of the ACL? ======================================================================= A study of public issues which includes comments on the need for and effects of government action does not constitute lobbying, and is performed routinely by institutions and think tanks, such as ECRI, without affecting their tax-exempt status. The ACL will not as an institution make recommendations directly to members of congress. Individuals who are interested in the subject may cite an ACL study to support the need for funding. An unfunded and relatively informal study of this type is unlikely by itself to carry sufficient weight to move congress to action, but ideally it could prompt the organization of a more formal study of the need for funding of a standard ontology, for example by the National Academy of Sciences, or by think tanks concerned with technical issues, whose opinions are valued by members of congress. ======================================================================= How can we expect that ontologists and lexical semanticists with different viewpoints could ever be induced to agree on a common approach? ======================================================================== It will indeed likely be difficult to forge agreements on specific issues, but where there is a recognition of the need for compromise, it can be accomplished. Building research resources is in many respects an engineering rather than a research activity, and the mindset required for such a task is quite different from the attitudes which are successful for basic research. One example of this difference was eloquently narrated in Kip Thorne's book "Black Holes and Time Warps" in which he described the analogous difficulty in coordinating several teams, each accustomed to basic theoretical research, in a new effort to design and build an expensive interferometric detector for gravity waves: "Within each team the individual scientists had free rein to invent new ideas and pursue them as they wished for as long as they wished; coordination was very loose. This is just the kind of culture that inventive scientists love and thrive on, the culture that Braginsky craves, a culture in which loners like me are happiest. But it is not a culture capable of designing, constructing, debugging, and operating large, complex scientific instruments like the several- kilometer long interferometers required for success. To design in detail the many complex pieces of such interferometers, to make them all fit together and work together properly, and to keep costs under control and bring the interferometers to completion within a reasonable time requires a different culture: a culture of tight coordination, with subgroups of each team focusing on well-defined tasks and a single director making decisions about what tasks will be done when and by whom. The road from freewheeling independence to tight coordination is a painful one. . . ." He continues that with reluctance, and prodding from the funding agency, the freewheeling and independent scientists made the necessary adjustments. An ontological lexicon for Computational Linguistics is of course a different type of research tool from a gravity-wave detector (and probably of much more immediate practical utility), but the need to build a unified structure which is tightly coordinated and internally consistent may be even greater than that for building physical measuring instruments, because of the likely sensitivity in an ontology to inconsistencies between even widely separated parts. Given the imperative for close coordination in ontology construction, is there a plausible way to achieve the necessary cooperation of groups with disparate viewpoints? I will suggest one possible scenario. If the prospect of organizing development of a standard ontology, as suggested here, reaches the stage where funding looks like a realistic possibility, discussions or a conference should be organized among those who would want to participate in its construction, to determine how many of the disparate systems could be integrated into a single consistent system. In such discussions, the teams will develop some appreciation of the likelihood that their own views may or may not be adopted, intact, or in modified form. Since the most important goal will be to create a database that will be used by the largest number of research teams, at some point disagreements about what formats or approaches to adopt will probably have to be resolved by some form of voting among participating groups, and he project director will need to be able to resolve any issues not amenable to the voting approach. Any group which recognizes that its own approach is incompatible with the majority and is likely not to be adopted, can try to argue for its technical superiority, but if the arguments are not accepted, such a group will face the choice of participating and adapting its own system to the dominant approach, or not participating, and continuing its own independent line of research. There will presumably be some groups interested in exploring novel approaches to knowledge representation that will want to continue along lines different from that adopted by the majority. However, from discussions I have held with people involved in investigation of word meanings, there appears to be a wide recognition of the need for some common database, and many or most are likely to participate in such a project. By the time that project proposals need to be submitted, there should be some preliminary agreement as to the likely outline of the general structure of the database that will be developed. The disagreements over details will need to be resolved in the course of actual funded development, but there will need to be some mechanism, whether by voting of an executive committee or decision of a project chairperson, to resolve residual disagreements by fiat. The manner of selection of the project chairperson would ideally include substantial input from the likely participants in the project. It is likely that to accommodate input from as many as possible of existing groups, the number of persons funded for this project will approach or exceed two hundred over an initial development stage of three to five years. The required funding for a project of that size will be close to two hundred million dollars ($200,000,000) over the five years. This will almost certainly require a special appropriation from congress. Other areas of science, including highly theoretical fields with little immediate practical applications, have succeeded in obtaining funding for projects comparable to and often much larger that this (the *annual* maintenance budget of the Hubble telescope is about $200 million). The possibility of congressional funding is realistic, provided that an adequate justification can be agreed upon among practitioners in the field. That is the purpose of forming this committee, and I hope that all of those who may have some use for an ontological lexicon will respond with information about potential uses that will allow us to demonstrate the cost-effectiveness of such a project. ___________________________________________________________________ Message diffus� par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:53:57 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:53:57 +0200 Subject: Appel: TAPD-2000 Message-ID: From: "Miguel A. Alonso Pardo" ====================================================================== Final CALL FOR PAPERS ---------------------------------------------------------------------- TAPD 2000 2nd Workshop on 'Tabulation in Parsing and Deduction' ---------------------------------------------------------------------- September 19-21 2000 Vigo, Spain Sponsored by University of Vigo with the support of Caixa Vigo e Ourense Logic Programing Associates http://coleweb.dc.fi.udc.es/tapd2000/ Following TAPD'98 in Paris (France) next TAPD event will be held in Vigo (Spain) in September, 2000. The conference will be previous to SEPLN 2000 (http://coleweb.dc.fi.udc.es/sepln2000/), the conference of the Spanish Society for Natural Language Processing. MOTIVATIONS: Tabulation techniques are becoming a common way to deal with highly redundant computations occurring, for instance, in Natural Language Processing, Logic Programming, Deductive Databases, or Abstract Interpretation, and related to phenomena such as ambiguity, non-determinism or domain ordering. Different approaches, including for example Chart Parsing, Magic-Set rewriting, Memoization, and Dynamic Programming, have been proposed whose key idea is to keep traces of computations to achieve computation sharing and loop detection. Tabulation also offers more flexibility to investigate new parsing or proof strategies and to represent ambiguity by shared structures. The first objective of this workshop is to compare and discuss these different approaches. The second objective is to present tabulation and tabular systems to potential users in different application areas such as natural language processing, picture parsing, genome analysis, or complete deduction techniques. TOPICS (not exclusive): -- Tabulation Techniques: Chart Parsing, Tabling, Memoization, Dynamic Programming, Magic Set, Generic Fix-Point Algorithms -- Applications: Parsing, Generation, Logic Programming, Deductive Databases, Abstract Interpretation, Deduction in Knowledge Bases, Theorem Proving -- Static Analysis: Improving tabular evaluation -- Parsing or resolution strategies. -- Efficiency issues: Dealing with large tables (structure sharing, term indexing), Execution models, Exploiting the domain ordering (subsumption). -- Shared structures (parse or proof forest): Formal analysis, representation and processing. WORKSHOP FORMAT: The workshop will be a 3-day event that provides a forum for individual presentations of the accepted contributions as well as group discussions. INVITED SPEAKERS: Bharat Jayaraman -- Univ. of New York at Buffalo, US I.V. Ramakrishnan -- Univ. New York at Stony Brook, US SUBMISSION PROCEDURE: Authors are invited to submit before April 28 a 4-page position paper or abstract concerning a theoretical contribution or a system to be presented. Due to tight time constraints, submission and reviewing will be handled exclusively electronically (LaTeX, PostScript, dvi or ascii format). Submission should include the title, authors' names, affiliations, addresses, and e-mail. The submissions must be sent to David S. Warren (warren at cs.sunysb.edu) in gziped encoded postscript. SCHEDULE: Submission of contributions: April 28, 2000 Notification of acceptance: June 1, 2000 Final versions due: June 30, 2000 PROGRAM COMMITTEE CHAIR: David S. Warren -- Univ. New York at Stony Brook, US PROGRAM COMMITTEE: Francois Bry -- Univ. Munich, Germany Manuel Carro -- Univ. Polit. Madrid, Spain Eric de la Clergerie -- INRIA, France Veronica Dahl -- Univ. Simon Fraser, Canada Baudouin Le Charlier -- Univ. Namur, Belgium Mark Jan Nederhof -- DFKI, Germany Luis M. Pereira -- Univ. Nova de Lisboa, Portugal Martin Rajman -- EPFL, Switzerland Domenico Sacca -- Univ. della Calabria, Italy Kostis Sagonas -- Univ. Uppsala, Sweden David Shasha -- Univ. New York, US Terrance Swift -- Univ. New York at Stony Brook, US Manuel Vilares -- Univ. Vigo, Spain David Weir -- Univ. Sussex, UK ORGANIZING COMMITTEE CHAIR: Manuel Vilares -- Univ. Vigo, Spain ORGANIZING COMMITTEE: Miguel A. Alonso -- Univ. Coruna, Spain Eric de la Clergerie -- INRIA, France David Cabrero -- Univ. Vigo, Spain Victor M. Darriba -- Univ. Coruna, Spain David Olivieri -- Univ. Vigo, Spain Francisco J. Ribadas -- Univ. Coruna, Spain Leandro Rodriguez -- Univ. Vigo, Spain PUBLICATION: Papers accepted by the Program Committee must be presented at the conference and will appear in a proceedings volume. The format for camera-ready manuscripts will be available from the web page of the event. LOCATION: Auditorio del Centro Cultural Caixavigo e Ourense Marques de Valladares Vigo, Spain FURTHER INFORMATION: For further details consult http://coleweb.dc.fi.udc.es/tapd2000/, or contact TAPD'2000 Secretariat Escuela Superior de Ingenier?a Inform?tica Campus as Lagoas, s/n 32004 Ourense Spain E-mail: tapd-secret at ei.uvigo.es Fax: +34 988 387001 ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:54:33 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:54:33 +0200 Subject: Job: 1 Offer Message-ID: From: "lilian.blochet" Merci de diffuser cette offre d'emploi: Technologies GID , soci?t? ?ditrice de Spirit ,moteur de recherche en langage naturel Internet/Intranet, recrute Un Chef de Projet Linguistique Au sein du d?partement Recherche et D?veloppement , cette personne sera charg?e de: -Encadrer une ?quipe de linguistes participant ? l'enrichissement des ressources linguistiques des langues Fran?aises et Anglaises -Suivre les relations avec les partenaires ?trangers pour les langues Espagnole,Portugaise,N?erlandaise,Allemande -La r?alisation des outils n?cessaires ? la gestion des ressources -Participer aux sp?cifications , prototypage et tests des nouvelles versions de l'analyseur morpho-syntaxique -La veille technologique Cette personne de formation sup?rieure en Linguistique Informatique ou traitement automatique du langage avec quelques ann?es d'exp?rience professionnelle , doit ?tre autonome dans l'environnement Unix et la programmation en Perl/Awk. Langue maternelle Fran?aise ou Anglaise R?mun?ration suivant profil et exp?rience Envoyer votre candidature par courrier ?lectronique ? mailto:lilian.blochet at technologies-gid.com Technologies GID 84/88 Bld de la Mission Marchand 92411 Courbevoie Cedex ------------------------------------------------------- Lilian Blochet Technologies-GID Directeur Departement Recherche et D?veloppements mailto:lilian.blochet at technologies-gid.com * 01 49 04 70 70 ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:55:07 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:55:07 +0200 Subject: Appel: Workshop on HCC Message-ID: From: Yorick Wilks THIRD ANNOUNCEMENT AND FINAL CALL FOR ABSTRACTS Apologies if you receive this from more than one source THIRD WORKSHOP ON HUMAN-COMPUTER CONVERSATION Grand Hotel Villa Serbelloni, Bellagio, Italy 3-5 July, 2000 Everything is on the website, including registration information on line, hotels (from simple to sumptious), the glorious site etc. The key date is 8 April when abstracts are due and that is only a few days away. Hotel accomodation should be booked as soon as possible. www.dcs.shef.ac.uk/research/units/ilash/Meetings/bellagio/ Invited speakers include (not all have yet accepted): Dr B. Alabiso, Microsoft, USA Dr J. Hutchens, UWA, Australia Prof. G Leech, University of Lancaster, UK Dr U. Reithinger, DFKI-Saarbruecken, DE Dr. T. Strzalkowski, General Electric, USA Prof. D. Traum, U Maryland, USA The Workshops on Human-Computer Conversation in Bellagio, Italy, took place in 1997 and 1998, as small groups of experts from industry and academia met to discuss this pressing question for the future of Language Engineering, not as an academic question only, but chiefly to bring forward for discussion computer demonstrations and activities within company laboratories that were not being published or discussed. The Workshops were highly successful in these aims and we now wish to widen participation and add distinguished speakers, as well as introducing more theoretical topics, though without losing the practical emphasis. The site remains one of the finest in the world, and it promoted excellent and intimate discussions in 1997 and 1998. The emphasis this year will take note of the CE Fifth Framework calls announced under Human Language Technology and in particular the emphasis on interactivity. We also plan to emphasise (in invited talks) the issue of politeness and whether it is crucial or dispensible to conversation, as well as recent results on statistical/empirical work on dialogue corpora, and on deriving marked up dialogue corpora. All details, including previous programs, program committee, accomodation and travel and details of registration are on the web site. Contributions are invited on any aspect of human-computer conversation, as are demonstrations. Two page abstracts should be sent by mail or email to the address at the bottom according to the following timetable: Deadline for submission: 8 April 2000 Notice of acceptance: 8 May 2000 Camera ready paper due: 8 June 2000 The European Association for Computational Linguistics (EACL), SigDial and ELSNET have endorsed the meeting. Submissions and further enquiries to: hccw at dcs.shef.ac.uk Yorick Wilks HCCW '2000 Department of Computer Science University of Sheffield 211 Portobello St., Sheffield S1 4DP UK phone: (44) 114 222 1814 fax: (44) 114 222 1810 email: hccw at dcs.shef.ac.uk www: http://www.dcs.shef.ac.uk/~yorick ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:09 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:09 +0200 Subject: Appel: DCAGRS 2000 Message-ID: From: Helmut Jurgensen Final Call: DCAGRS 2000 Descriptional Complexity of Automata, Grammars and Related Structures Please note that the submission deadline has changed to 15 April 2000. Submissions concerning the descriptional complexity of automata, grammars and related structures are invited for a workshop to be held in London, Ontario, on 27--29 July, 2000. Topics include, but are not limited to the following: -- various measures of descriptional complexity of automata, grammars and languages -- circuit complexity of Boolean functions and related measures -- succinctness of description of (finite) objects -- descriptional complexity in resource-bounded or structure-bounded environments -- structural complexity Papers on applications of such issues, for instance in the fields of software or hardware testing, systems modelling as well demonstrations of systems related to these issues are also welcome. DCAGRS 2000 will be part of a three-conference event, held at the University of Western Ontario in London, Ontario, Canada, in the week of July 24 to 29, 2000. -- CIAA 2000, the Conference on the Implementation and Application of Automata, held on 24--25 July. -- Half Century of Automata Theory, held on 26 July. -- DCAGRS 2000, held on 27--29 July. There will also be a workshop on coding theory held on 31 July and 1 August at the same location. For more information about these events visit any of the following www-sites: www.csd.uwo.ca/~ciaa2000 (CIAA 2000) www.cs.uni-potsdam.de/~dcagrs (DCAGRS 2000) www.cs.uni-potsdam.de/~dcagrs/triconf.html (Tri-Conference) www.cs.uni-potsdam.de/~dcagrs/codes.html (Coding Theory) www.csd.uwo.ca/~automata (Half Century) and follow the links from there. The DCAGRS 2000 deadlines are as follows: -- 15 April 2000, submission of papers -- 5 May 2000, notification of authors -- 1 May 2000, submission of demo proposals -- 1 July 2000, submission of final copy for pre-proceedings -- 27--29 July 2000, workshop Details regarding the submission procedures are available on the www at If you have difficulties accessing the www we can send you the information by email. In that case, please send your request to boldt at cs.uni-potsdam.de (Oliver Boldt) DCAGRS is sponsored by the IFIP WG 1.2 Conference chair for DCAGRS 2000: Helmut Jurgensen, helmut at uwo.ca ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:45 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:45 +0200 Subject: Appel: COLING-2000 Workshop Message-ID: From: Remi Zajac Call for submissions for the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems Centre Universitaire, Luxembourg, 5 August 2000 (see also this call at http://crl.nmsu.edu/Events/COLING00) Background The purpose of the workshop is to present the state-of-the-art on NLP toolsets and workbenches that can be used to develop multilingual and/or multi-applications NLP components and systems. Although technical presentations of particular toolsets are of interest, we would like to emphasize methodologies and practical experiences in building components or full applications using an NLP toolset. Combined demonstrations and paper presentations are strongly encouraged. Many toolsets have been developed to support the implementation of single NLP components (taggers, parsers, generators, dictionaries) or complete Natural Language Processing applications (Information Extraction systems, Machine Translation systems). These tools aim at facilitating and lowering the cost of building NLP systems. Since the tools themselves are often complex pieces of software, they require a significant amount of effort to be developed and maintained in the first place. Is this effort worth the trouble? It is to be noted that NLP toolsets have often been originally developed for implementing a single component or application. In this case, why not build the NLP system using a general programming language such as Lisp or Prolog? There can be at least two answers. First, for pure efficiency issues (speed and space), it is often preferable to build a parameterized algorithm operating on a uniform data structure (e.g., a phrase-structure parser). Second, it is harder, and often impossible, to develop, debug and maintain a large NLP system directly written in a general programming language. It has been the experience of many users that a given toolset is quite often unusable outside its environment: the toolset can be too restricted in its purpose (e.g. an MT toolset that cannot be used for building a grammar checker), too complex to use, or even too difficult to install. There have been, in particular in the US under the Tipster program, efforts to promote instead common architectures for a given set of applications (primarily IR and IE in Tipster; see also the Galaxy architecture of the DARPA Communicator project). Several software environments have been built around this flexible concept, which is closer to current trends in main stream software engineering. The workshop aims at providing a picture of the current problems faced by developers and users of toolsets, and future directions for the development and use of NLP toolsets. We encourage reports of actual experiences in the use of toolsets (complexity, training, learning curve, cost, benefits, user profiles) as well as presentation of toolsets concentrating on user issues (GUIs, methodologies, on-line help, etc.) and application development. Demonstrations are also welcome. Audience Researchers and practitioners in Language Engineering, users and developers of tools and toolsets. Issues Although individual tools (such as a POS taggers) have their use, they typically need to be integrated in a complete application (e.g. an IR system). Language Engineering issues in toolset and architectures include (in no particular order): Practical experience in the use of a toolset; Methodological issues associated to the use of a toolset; Benefits and deficiencies of toolsets; User (linguist/programmer) training and support; Adaptation of a tool (or toolset) to a new kind of application; Adaptation of a tool to a new language; Integration of a tool in an application; Architectures and support software; Reuse of data resources vs. processing components; NLP algorithmic libraries. Format of the Workshop The one-day workshop will include twelve presentation periods which will be divided into 20 minutes presentations followed by 10 minutes reserved for exchanges. We encourage the authors to focus on the salient points of their presentation and identify possible controversial positions. There will be ample time set aside for informal and panel discussions and audience participation. Please note that workshop participants are required to register at http://www.coling.org/reg.html. Deadlines 21 May 2000: Submission deadline. 11 June 2000: Notification to authors. 24 June 2000: Final camera-ready copy. 5 August 2000: COLING-2000 Workshop. Submission Format Send submissions of no more than 6 pages conforming to the COLING format (http://www.coling.org/format.html) to zajac at crl.nmsu.edu. We prefer electronic submissions using either PDF or Postscript. Final submissions can extend to 10 pages. Organizing Committee R?mi Zajac (Chair), CRL, New-Mexico State University, USA: zajac at crl.nmsu.edu. Jan Amtrup, CRL, New-Mexico State University, USA: jamtrup at crl.nmsu.edu. Stephan Busemann, DFKI, Saarbrucken: busemann at dfki.de. Hamish Cunningham, University of Sheffield: hamish at dcs.shef.ac.uk. Guenther Goerz, IMMD VIII, University of Erlangen: goerz at immd8.informatik.uni-erlangen.de. Gertjan van Noord, University of Groningen: vannoord at let.rug.nl. Fabio Pianesi, IRST, Trento: pianesi at irst.itc.it. Of Related Interest The Natural Language Software Registry at http://www.dfki.de/lt/registry/sections.html The Coling-200 Web Site at http://www.coling.org/ --- ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:51 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:51 +0200 Subject: Ecole: CLaRK'2000 Message-ID: From: Frank Richter Announcement: Summer School 2000 in Bulgaria - CLaRK'2000 The Tuebingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK) is inviting applications to a summer school in Sozopol, Bulgaria, this summer. Dates: August 27th - September 10th 2000 (days of arrival and departure) Place: Resort town of Sozopol (Black Sea), Bulgaria Language: English Participants: Participants should be doctoral students who research the interfaces between computer science, cognitive science, linguistics, mathematics and philosophy. In exceptional cases, postdoctoral researchers as well as outstanding students in the final year of masters level studies who intend to pursue a doctorate will also be considered. The summer school is limited to 25 students. Places are competitively allocated on the basis of the research interests of the participants and the perceived benefits to those interests of attending the summer school. Participants must be proficient in English. Stipends: Via the CLaRK Program, the Volkswagen-Foundation will provide stipends for up to 6 students from the countries of Central and Eastern Europe and 6 further students from Bulgaria. The stipends will be awarded on a competitive basis. The stipends will comprise travel costs (up to DEM 600), and room and board for the duration of the summer school. At the discretion of the CLaRK Program, the stipends may include additional support for travel costs above DEM 600. Costs: Participants who are not sponsored by a CLaRK stipend should anticipate approximately DEM 125 for room and board per day. Costs for transportation to and from the summer school are not included in this estimate. Applications: Applications with a completely filled in registration form (available from www.uni-tuebingen.de/IZ/application.rtf), a curriculum vita, and a short (maximum three pages) summary of relevant past and present research and education must be submitted to the Office of the International Centre at Tuebingen by 30th April 2000. Applicants should indicate whether they are applying for a CLaRK stipend. CLaRK stipend applications must include a letter of recommendation with their application. Internationales Zentrum fuer Wissenschaftliche Zusammenarbeit Universitaet Tuebingen Keplerstr. 17 D - 72074 Tuebingen Tel.: (0049) 7071 / 29 - 77352 or /29 - 74156 Fax: (0049) 7071 / 29 5989 e-mail: iz at uni-tuebingen.de WWW: www.uni-tuebingen.de/IZ/starte.html Content and Goals Computational linguistics and knowledge representation are two distinct disciplines that share a common concern with what knowledge is, how it is used, and how it is acquired. However, though knowledge representation and computational linguistics clearly address broadly similar research problems, research within each of these fields has hitherto been largely ignorant of research in the other. Moreover, the ignorance the two fields have of each other both fosters and is fostered by a wide gulf between the educations received by students of knowledge representation and students of computational linguistics. The goal of the summer school is to help bridge this gulf by introducing the summer school students to recent developments in the interdisciplinary field of computational linguistics and knowledge representation. The summer school will take the form of courses in various topics. The program provisionally includes courses in computational morphology, corpus linguistics, declarative knowledge representation, natural language semantics, Slavic syntax and psycholinguistics. Preliminary Course Program Erhard Hinrichs, Sandra Kuebler: Computational Tools for Corpus Linguistics Valia Kordoni/Frank Richter: A Comparison of LFG and HPSG Anna Kupsc: Slavic in HPSG Detmar Meurers: Introduction to HPSG Janina Rado: Introduction to Psycholinguistics Kiril Simov/Gergana Popova: Computational Morphology Kiril Simov/Atanas Kiryakov: Declarative Knowledge Representation Contact for further information: Kiril Ivanov Simov (Sofia): kivs at bgcict.acad.bg Frank Richter (Tuebingen): fr at sfs.nphil.uni-tuebingen.de WWW: http://www.sfs.nphil.uni-tuebingen.de/clark/ ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Sat Apr 8 09:57:52 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Sat, 8 Apr 2000 11:57:52 +0200 Subject: Conf: Large Corpora and Annotation Standards Message-ID: From: "Nancy M. Ide" Large Corpora and Annotation Standards http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html Held in conjunction with ANLP/NAACL'00 Seattle, Washington 4 May 2000 1-6pm This meeting is intended to bring together researchers and developers from a variety of domains in text, speech, video, etc., to look broadly at the technical issues that bear on the development of software systems and standards for the annotation and exploitation of linguistic resources. The goal is to lay the groundwork for the definition of a data and system architecture to support corpus annotation and exploitation that can be widely adopted within the community. Among the issues to be addressed are: - layered data architectures - system architectures for distributed databases - support for plurality of annotation schemes - impact and use of XML/XSL - support for multimedia, including speech and video - tools for creation, annotation, query and access of corpora - mechanisms for linkage of annotation and primary data - applicability of semi-structured data models, search and query systems, etc. - evaluation/validation of systems and annotations The motivation for this meeting is the American National Corpus (ANC) effort, which should begin corpus creation within the year. We anticipate that the ANC will provide a significant resource for natural language processing, and we therefore seek to identify state-of-the-art methods for its creation, annotation, and exploitation. Also, as a national and freely available resource, the data and system architecture of the ANC is likely to become a de facto standard. We therefore hope to draw together leading researchers and developers to establish a basis for the design of a system to support the creation and use of the ANC. Provisional Program Overview of the American National Corpus Effort Nancy Ide and Catherine Macleod Searching Linguistically Annotated Corpora Chris Brew Considerations for Large Corpus Annotation: Intercoder Reliability Rebecca Bruce and Janyce Wiebe The XML Framework and Its Implications for Large Corpus Access Nancy Ide The ATLAS System John Henderson Annotation Standards and Their Impact on Large Corpus Development Nicoletta Calzolari A Framework for Multi-level Linguistic Annotation Patrice Lopez and Laurent Romary Discussion : Requirements for the ANC A related workshop will be held at the LREC conference on May 29-30, 2000. Se http://www.cs.vassar.edu/~ide/anc/lrec.html. Organizer: Nancy Ide Professor and Chair Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 914 437-5988 Fax: +1 914 437-7498 ide at cs.vassar.edu ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:22 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:22 +0200 Subject: Appel: TELRI Message-ID: From: "Patrick Ruch" Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:25 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:25 +0200 Subject: Conf: 2 ANLP/NAACL annoucements Message-ID: ____________________________________________________________________________ 1/ From: Priscilla Rasmussen Subject: Large Corpora & Annotation Standards at ANLP/NAACL2000 2/ From: radev at si.umich.edu Subject: ANLP/NAACL workshop on Automatic Summarization ____________________________________________________________________________ 1/ From: Priscilla Rasmussen Subject: Large Corpora & Annotation Standards at ANLP/NAACL2000 Large Corpora and Annotation Standards http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html Held in conjunction with ANLP/NAACL'00 Seattle, Washington 4 May 2000 1-6pm This meeting is intended to bring together researchers and developers from a variety of domains in text, speech, video, etc., to look broadly at the technical issues that bear on the development of software systems and standards for the annotation and exploitation of linguistic resources. The goal is to lay the groundwork for the definition of a data and system architecture to support corpus annotation and exploitation that can be widely adopted within the community. Among the issues to be addressed are: - layered data architectures - system architectures for distributed databases - support for plurality of annotation schemes - impact and use of XML/XSL - support for multimedia, including speech and video - tools for creation, annotation, query and access - of corpora - mechanisms for linkage of annotation and primary data - applicability of semi-structured data models, - search and query systems, etc. - evaluation/validation of systems and annotations The motivation for this meeting is the American National Corpus (ANC) effort, which should begin corpus creation within the year. We anticipate that the ANC will provide a significant resource for natural language processing, and we therefore seek to identify state-of-the-art methods for its creation, annotation, and exploitation. Also, as a national and freely available resource, the data and system architecture of the ANC is likely to become a de facto standard. We therefore hope to draw together leading researchers and developers to establish a basis for the design of a system to support the creation and use of the ANC. Provisional Program Overview of the American National Corpus Effort Nancy Ide and Catherine Macleod Searching Linguistically Annotated Corpora Chris Brew Considerations for Large Corpus Annotation: Intercoder Reliability Rebecca Bruce and Janyce Wiebe The XML Framework and Its Implications for Large Corpus Access Nancy Ide The ATLAS System John Henderson Annotation Standards and Their Impact on Large Corpus Development Nicoletta Calzolari A Framework for Multi-level Linguistic Annotation Patrice Lopez and Laurent Romary Discussion : Requirements for the ANC A related workshop will be held at the LREC conference on May 29-30, 2000. See http://www.cs.vassar.edu/~ide/anc/lrec.html. Organizer: Nancy Ide Professor and Chair Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 914 437-5988 Fax: +1 914 437-7498 ide at cs.vassar.edu ____________________________________________________________________________ 2/ From: radev at si.umich.edu Subject: ANLP/NAACL workshop on Automatic Summarization CALL FOR PARTICIPATION ANLP/NAACL Workshop on Automatic Summarization Sunday, April 30, 2000 Westin Hotel Seattle, WA 48103 REGISTRATION (until April 20) http://www.gte.com/anlp-naacl2000 SCHEDULE 09:10-09:25 Introduction 09:25-10:15 Session on Content Selection 09:25-09:50 Concept Identification and Presentation in the Context of Technical Text Summarization Horacio Saggion and Guy Lapalme, DIRO-Universite de Montreal 09:50-10:15 Mining Discourse Markers for Chinese Textual Summarization Samuel W. K. Chan, Tom B. Y. Lai, W. J. Gao, and Benjamin K. Tsou, City University of Hong Kong 10:15-10:40 Session on Visualization 10:15-10:40 Multi-document Summarization by Visualizing Topical Content Rie Kubota Ando, Branimir K. Boguraev, Roy J. Byrd, and Mary S. Neff, Cornell University, and IBM Research 10:40-11:05 Coffee Break (provided) 11:05-12:20 Session on Multi-Document Summarization 11:05-11:30 Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies Dragomir R. Radev, Hongyan Jing, Margo Budzikowska, University of Michigan, Columbia University, and IBM Research 11:30-11:55 Extracting Key Paragraph based on Topic and Event Detection - Towards Multi-Document Summarization Fumiyo Fukumoto and Yoshimi Suzuki, Yamanashi University 11:55-12:20 Multi-Document Summarization By Sentence Extraction Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz, Carnegie Mellon University and Just Research 12:20-01:50 Lunch Break (on your own) 01:50-03:05 Session on Evaluation 01:50-02:15 A Text Summarizer in Use: Lessons Learned from Real World Deployment and Evaluation Mary Ellen Okurowski, Harold Wilson, Joacquin Urbina, Tony Taylor, Ruth Colvin Clark, and Frank Krapcho, Department of Defense, SRA Corp, Clark Training & Consulting, and Kathpal Technologies Inc. 02:15-02:40 Evaluation of Phrase-representation Summarization based on Information Retrieval Task Mamiko Oka and Yoshihiro Ueda, Fuji Xerox Co., Ltd. 02:40-03:05 A Comparison of Rankings Produced by Summarization Evaluation Measures Robert L.Donaway, Kevin W. Drummey, and Laura A. Mather, Department of Defense and Britannica.com, Inc. 03:05-03:30 Coffee Break (provided) 03:30-04:30 Panel on "Language Modeling in Text Summarization" 04:30-04:55 Session on Multimedia Summarization 04:30-04:55 Using Summarization for Automatic Briefing Generation Inderjeet Mani, Kristian Concepcion, and Linda van Guilder, MITRE Corporation 04:55-06:00 Panel on "Summarization: Industry Perspectives" ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:50:27 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:50:27 +0200 Subject: Conf: RIAO-2000 Message-ID: From: "Karim Chibout" RIAO 2000 6th Conference on "Content-Based Multimedia Information Access" College de France (Paris, France) April 12-14, 2000 _______________ Final Announcement _______________ Organized by: C.I.D. (France) and C.A.S.I.S. (USA) Under the sponsorship of the European Commission, the French Ministry of Education, Research and Technology, the DGA, the CEA, ELRA and ELSNET With the collaboration of AII, ASIS, ESCA and AUF/Francil _______________ For the Final Conference Program and Registration, please visit the Web site: http://host.limsi.fr/RIAO ______________ The theme of the conference is "Content-Based Multimedia Information Access". The conference scope will range from the traditional processing of text documents to the rapidly growing field of automatic indexing and retrieval of images and speech and, more generally, to all processing of audio-visual and multimedia information on various distribution venues, including the Net. The conference is of interest for several scientific communities, including Information Retrieval, Natural Language Processing, Spoken Language Processing, Computer Vision, Human-Computer Interaction and Digital Libraries. RIAO 2000 will, thereby, serve as a forum for cross-discipline initiatives and innovative applications. RIAO 2000 will present recent scientific progress, demonstrations of prototypes resulting from this research as well as the most innovative products now appearing on the market. The Conference Advance Program is highlighted by contributions of authors from 26 countries. The program includes 2 invited speakers, 3 panel sessions, 3 plenary sessions, 8 poster sessions and 16 oral sessions. Among all sessions are 145 papers (75 oral and 70 poster presentations), providing a unique opportunity to present and discuss in depth the state-of-the-art in this rapidly growing scientific and technological field. There will also be many innovative application demonstrations presented by companies from different countries. The application committee has already selected about 20 of them covering various applications such as crosslingual English-Arabic Internet search, recognition of printed and handwritten texts, television archives retrieval, sign language indexing, machine translation, etc. For more information on the program, conference location and registration, please visit the Web site : http://host.limsi.fr/RIAO or contact us at: - For all scientific matters: riao2000 at limsi.fr - For all organizational, technical and practical matters: cidcol at club-internet.fr ------------------------------------- Joseph MARIANI LIMSI-CNRS BP 133 91403 Orsay Cedex (France) T?l.: (33/0) 1 69 85 80 85 Fax: (33/0) 1 69 85 80 88 Email: mariani at limsi.fr Web: http://www.limsi.fr/ **************************************** Karim Chibout FRANCIL LIMSI-CNRS B.P. 133 91403 Orsay Cedex FRANCE telephone: (+33/0) 1.69.85.80.66 telecopie: (+33/0) 1.69.85.80.88 courriel: chibout at limsi.fr http://www.limsi.fr/Individu/chibout/ ******************************************** ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 18 16:54:09 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 18 Apr 2000 18:54:09 +0200 Subject: Jobs: 5 Offers Message-ID: ____________________________________________________________________________ 1/ From: Philippe Monnier Subject: Appel a Candidature 2/ From: MIT2USA at aol.com Subject: MIT2 / CreoleTrans Joint Press Release 3/ From: Naomi Hallan Subject: Chemnitz University 4/ From: Anna Bjarnestam Subject: Linguistic Programmer at Getty Images, Washington USA 5/ From: Kathleen Black Subject: Comp Linguist at Cycorp, Inc., Austin Texas USA ____________________________________________________________________________ 1/ From: Philippe Monnier Subject: Appel a Candidature POSTE : RESPONSABLE DE LA RECHERCHE Dans le cadre d'un projet d'envergure d'innovation technologique soutenu par l'ANVAR, la soci?t? STARTEM, sp?cialis?e dans la gestion de l'information internationale et multim?dia recherche son responsable de la recherche et du d?veloppement. Il travaillera dans un cadre pluridisciplinaire (informatique, linguistique, ergonomie) ? l'?laboration d'une cha?ne de traitement de l'information int?grant des NTIC. Il exercera son activit? dans le cadre d'une ?quipe de projet transversale ayant pour objectif l'augmentation de la productivit?. Dans le cadre de la cha?ne de traitement de l'information : - Il accompagnera la soci?t? dans le choix des solutions technologiques et assurera le suivi de leur mise en place. - Il coordonnera l'int?gration des modules de TALN plurilingue (cat?gorisation des textes ; extraction d'informations ; g?n?ration automatique de textes). - Il sera responsable des argumentaires portant sur les choix technologiques de la soci?t? ? destination des organismes institutionnels susceptibles d'?tre int?ress?s par le projet. - Il coordonnera les diff?rentes collaborations avec des laboratoires de recherche et encadrera les stagiaires et postdoc travaillant sur le projet. Profil : - Formation : docteur ou ingenieur experiment? - Domaine d'expertise : Informatique, TALN, linguistique ; - Connaissances informatiques indispensables : Java; Perl; C++; XML/SGML. - Une exp?rience en entreprise est necessaire, - Une experience dans des projets europeens serait un plus - Une formation complementaire en gestion de projet serait un atout. - Anglais courrant necessaire (oral / ecrit). Proc?dure : Envoyer CV et lettre de motivation ? Marion Denneulin Email : mdenneulin at cmc.fr STARTEM 60 rue de Ponthieu 75008 Paris ____________________________________________________________________________ 2/ From: MIT2USA at aol.com Subject: MIT2 / CreoleTrans Joint Press Release Strategic Partnering for Haitian Creole Translation Services Mason Integrated Technologies Ltd (MIT2), a software developer and service provider specialized in language processing solutions for Creole languages, and CreoleTrans, a Haitian Creole language translation service provider, announce the forming of a strategic partnership to co-market and cross-sell their Workflow management technologies and translation services in order to expand functionality and effectiveness of both companies. MIT2, creator of the range of CreoleScan(tm) OCR and CreoleConvert(tm) orthography conversion software programs, provides written language stabilization solutions and productivity tools for processing texts in Creole and other minority and vernacular languages. CreoleTrans is comprised of an experienced team of Haitian Creole translators and editors and has a broad customer base including publishers, universities, schools and education systems, and government agencies. CreoleTrans would be the first Haitian Creole translation provider to use MIT2's software tools for producing and validating Creole (source or target language) texts. For more information, please contact: Mason Integrated Technologies Ltd P.O. Box 181015, Boston, Massachusetts 02118 USA Tel: (+1) 617 247-8885, Fax: (+1) 617 262-8923 E-mail: mit2usa at aol.com Web: http://hometown.aol.com/mit2usa/Index2.html CreoleTrans 470 NE 210 Circle Terrace #203, Miami, FL 33179 USA Tel: (+1) 305 770-9252, Fax: (+1) 305 690-5933 E-mail: info at creoletrans.com Web: http://www.creoletrans.com/ ******* Mason Integrated Technologies Ltd P.O. Box 181015 Boston, MA 02118 USA (617) 247-8885 (office & answering machine) (617) 262-8923 (FAX) MIT2USA at aol.com (e-mail) Mason Integrated Technologies Ltd Home Page: http://hometown.aol.com/mit2usa/Index2.html Orthographically Converted HC Texts Download Site: http://hometown.aol.com/mit2haiti/Index4.html Meet Marilyn Mason: http://hometown.aol.com/marilinc/Index3.html MIT2 Job Opportunities http://hometown.aol.com/mit2usa/JobOpps.html ____________________________________________________________________________ 3/ From: Naomi Hallan Subject: Chemnitz University WANTED: Graduate (Languages/Linguistics/Teaching/...) with good Internet-computing skill The English Linguistics department at the Chemnitz University of Technology is looking for someone to join an on-going research project. The post would be initially for 18 months, with a possibility of an extension if there is a further phase of the project, starting on 1st June 2000 or as soon as possible thereafter. Payment is on the Bat IIa (Ost) scale, with the salary level dependent on age and experience. The project, "Learner Behaviour in the Internet Grammar", http://www.tu-chemnitz.de/InternetGrammar/, is part of an inter-disciplinary research group "New media in everyday life", funded by the German Research Association (Deutsche Forschungsgesellschaft). What is the Internet Grammar? We are building a grammar-learning environment, accessible using a web-browser, for advanced learners of English. We are using material from a variety of corpora, including our own English-German Translation Corpus, to provide examples and material for exercises wherever possible. The software infra-structure which we have designed makes it possible to track user behaviour in great detail, and we are hoping to discover how different types of learners interact with language teaching material presented on the Web. What would you do? We need someone to take over the care of our software infra-structure and help us to develop it further. You would work with the other members of the team, who are responsible for designing and writing content and, in part, for preparing it for insertion in the grammar. Your tasks would be: (a) to maintain the existing structures, which involve cgi scripts, corpus search facilities and interactive animations, as well as the more conventional elements of a web-site; (b) to assist with the extraction and analysis of learner data; (d) to help extend the functionality of the grammar, both for users and researchers, through the development of new features and the improvement of existing ones. Qualifications: You will have a degree in a relevant subject and be able to show possession of the necessary software skills - such as cgi and perl or javascript, in addition to html - or a willingness to acquire them very rapidly. You should also have an interest in the use of the Web and corpora for language teaching and learning. You should enjoy working in a team and value the opportunity to help with the further development of our project. The working language of the project is English, so fluency would be an advantage. Working in Chemnitz? Apart from the satisfaction of helping to see an exciting research project to its completion, you would have the advantage of a stimulating university environment in a city which is growing and changing every day. Low rents for well-equipped apartments in elegant newly restored houses; leafy suburbs, beautiful countryside; a varied and vigorous cultural life (opera, world-class art exhibitions, cabaret . . . ); all these in one of the most enterprising and economically active cities in the "new" German states. Please send a CV and covering letter as soon as possible to: Prof. Dr. Josef Schmied Englische Sprachwissenschaft Technische Universit?t Chemnitz D-09107 Chemnitz, Germany. e-mail to: realcentre at phil.tu-chemnitz.de For more information about the project: http://www.tu-chemnitz.de/InternetGrammar/ ____________________________________________________________________________ 4/ From: Anna Bjarnestam Subject: Linguistic Programmer at Getty Images, Washington USA Rank of Job: Full Time Permanent Areas Required: Linguistic Programmer Other Desired Areas: Technology University or Organization: Getty Images Department: Getty Technology Group State or Province: Washington Country: USA Final Date of Application: 05/30/2000 Contact: Anna Bjarnestam anna.bjarnestam at gettyimages.com Address for Applications: Getty Images, 701, N 34th Street, Suite 400 Seattle WA 98103 USA Job - Linguistic Programmer Responsibilities - Getty Images have vast sources of text attached to imagery that need to be indexed automatically in some manner for searching and retrieval purposes. Primary responsibilities involve development of a semantic or syntactic tagger for natural language English. The tagger should be based on a controlled vocabulary developed and currently in use at Getty Images. The most important aspect of the work is the programming of these NLP tools, rather than finding linguistic solutions for functionality designs etc. Other job tasks involve metadata integration projects, various smaller NLP tool developments and machine-readable vocabulary development. This work is mainly for the creative professional (gettyone) and the editorial Getty Images channels (gettysource), see http://www.gettyimages.com Qualifications - - Strong programming skills (knowledge of C++, Perl or other) - Some knowledge of grammatical theories is preferred - Some understanding of NL parsing theory (which may include statistical and/or corpus-based parsing methods, tagger development) - Experience in computational lexicography or computational linguistics and online dictionary development andawareness of current NLP technology and available vocabularies - A degree in linguistics or computer science or a closely related discipline is preferred. ____________________________________________________________________________ 5/ From: Kathleen Black Subject: Comp Linguist at Cycorp, Inc., Austin Texas USA Rank of Job: -- Areas Required: -- Other Desired Areas: -- University or Organization: Cycorp, Inc. Department: Natural Language Development State or Province: TX Country: USA Final Date of Application: none Contact: Kathleen Black kat at cyc.com Address for Applications: 3721 Executive Center Drive, Suite 100 Austin TX 78731 USA Cycorp (http://www.cyc.com/) has begun to harness the power of its Cyc(TM) common sense knowledge base and reasoning system to do semantic and pragmatic disambiguation of English. Currently we are working on new and exciting clarification dialogue interfaces for Cyc itself and for Cyc-based applications. These include applications for smart Web searching, question and answer dialogues, and speech understanding, to name just a few. Join the team building this one-of-a-kind interactive dialogue front end. You will create formal representations of natural language expressions and phenomena, as well as develop applications to exploit such representations. Candidates for these positions must be familiar with formal logic, and have sound fundamentals in English usage, syntax, and semantics. In addition, one or more of the following would be a plus: Knowledge of discourse structure, pragmatics, and dialog modeling Experience with the influence of semantic distinctions on syntax Familiarity with formal semantic analysis Facility with knowledge representation and other AI tools and techniques Knowledge of constraint-based grammatical theories Understanding of NL parsing theory (statistical, corpus-based, etc.) Experience in applying that knowledge (computational NLU systems) Experience in computational lexicography Experience in natural language generation Knowledge of NL interface design and human cognitive considerations Programming skills, especially in Lisp or Scheme General Information: All technical positions at Cycorp involve working with the Cyc(TM) technology -- an immense, broad, multi-contextual knowledge base and efficient inference system which our group has developed over the last 16 years and 400 person-years. The Cyc knowledge base, spanning fundamental "consensus" human knowledge, enables a multitude of knowledge-intensive products and services which will revolutionize the way in which people use and interact with computers: semantic information retrieval, consistency-checking of structured information, deductive integration of heterogeneous data bases, natural language interfaces able to cope with realistic levels of ambiguity/terseness/contextualization, and many more. Cycorp is located in Austin, TX. We are an equal opportunity employer. For more information about employment at Cycorp, visit our website at http://www.cyc.com/employment.html For immediate consideration, please send your resume and a cover letter to Kathleen Black at the following address: Cycorp, Inc. 3721 Executive Center Drive, Suite 100 Austin TX 78731 Internet: info at cyc.com Telephone: +1 (512) 342-4000 Fax: +1 (512) 342-4040 No person shall be excluded from consideration for recruitment, selection, appointment, training, promotion, retention, or any other personnel action, or be denied any benefits or participation in any activities on the grounds of race, religion, color, national origin, sex, handicap or age. Cycorp will hire only persons authorized to work in the United States and will verify identity and eligibility for employment, and complete form I-9 for all new employees within three (3) business days of the date of hire. ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pz at biomath.jussieu.fr Wed Apr 19 11:44:32 2000 From: pz at biomath.jussieu.fr (Pierre Zweigenbaum) Date: Wed, 19 Apr 2000 12:44:32 +0100 Subject: R: Debugging computational grammars Message-ID: Date: Tue, 18 Apr 2000 08:42:50 +0100 From: Christian Boitet Message-Id: Dear colleague, 18/4/2000 Sorry I left this unanswered for so long. I Ariane-G5 (presented at the last ATALA WS on tools, some more on my web site & in the literature), we have developed quite powerful tools to debug our grammars, which are actually made of modules consisting of transformation rules. 1) Only a modular organisation of the rule systems allows to do that efficiently. In ROBRA, for instance, we have 3 levels: transformational system, grammar, rule. 2) Provide for a variety of traces, at each of these levels, of course in terms of the external specialized language (SLLP) used by the linguist-developer. 3) If you develop really large applications, a specification level such as that of Vauquois' "static grammars" (ref. in Vauquois Analects 1989 & in Zarin's articles) is most crucial to create and maintain your computational grammars in an orderly way. As an anecdote, I was personnally able to debug a particular point of an English-Thai prototype without knowing anything about the computational grammars developed, or even about Thai. I was with Pr. Udom Warutamasikkhadit who told me something was wrong in a translation. I ran the sentence in question, tracing the output trees after AS (structural analysis), before GS (structural + syntactic generation), & after it. Looking at the static grammars "boards" for Thai, he told me the input to generation was OK, the output not. I then went to the ROBRA module & 3-4 rules indicated in the board as implementing the construct in question, and easily detected that one rule schema did not correspond to the board. I corrected it, reran, it worked. All this in less than 15 minutes, and the computational grammars in question represented (for analysis, transfer & generation) several hundred pages in source form (including of course comments). 4) One originality in ROBRA (our language for writing tree transformational systems) is that tracing and stepping are modular and graded: - one may trace or not each main step of the automaton: prefiltering, labelling (= finding all possible rule occurrences), choice (conflict resolution and production of the set of rule occurrences to apply in parallel to the tree -- which may represent several paragraphs and contain thausend of nodes), and transformation proper. - there are 4 levels of global and dynamic) tracing, 1-4, and each trace point also has a (local and static) tracing grade. Whether the trace is produced depends on whether, at a given point, the sum exceeds 4. In this way, one can step and see more or less details. - in the same spirit, any active tree (contained in the stack) can be visualize in 4 or 5 geometric forms, with only the lexical units, or with the complete decoration of each node. -- in a future implementation, we should add a graphical, mouse sensitive interface, allowing to examine individual nodes by clicking them. This domain is still full of interesting possibilities. Yours very sincerely, Ch.Boitet >Date: Thu, 23 Sep 1999 10:23:59 +0200 (MET DST) >From: Alberto Lavelli >Message-Id: <199909230824.KAA16161 at ecate.itc.it> > > >Dear colleagues, > >I'm looking for references to the problem of developing and debugging >computational grammars for natural languages. I'm particularly >interested in tools and approaches used in debugging grammars >(particularly in their use when dealing with relatively large >hand-written grammars). In the computational systems I'm aware of, >usually there is only a limited (and standard) set of debugging tools: >tracers, steppers, chart browsers. > >Furthermore, does anybody know any extensive study on the most >suitable strategies/tools to cope with the writing/testing/debugging >cycle (always with a particular emphasis on debugging)? > >I know that there have been hints to this problem in related areas >(e.g., the EU projects TSNLP and DiET, some papers at the ACL-EACL97 >workshop on Computational Environments for Grammar Development and >Linguistic Engineering) but it seems to me that this topic has so far >received little attention. But perhaps I'm missing some relevant >contributions and so I'm asking for your help. > >Apart from references to relevant stuff, I'm also interested in your >general opinion on the issue. Is this (alleged) lack of interest an >indication of the fact that such issue is in your opinion not >particularly relevant? > > >I'll post a summary if I receive enough responses > > >best > alberto > > >ps: I have sent this message to several mailing lists. I apologize if >you receive it more than once. ------------------------------------------------------------------------- Christian Boitet (Pr. Universite' Joseph Fourier) Tel: +33.4-7651-4355/4817 GETA, CLIPS, IMAG-campus, BP53 Fax: +33.4-7651-4405 385, rue de la Bibliothe`que Mel: Christian.Boitet at imag.fr 38041 Grenoble Cedex 9, France Mobile: +33-(0)6-6005-1969 http://www-clips.imag.fr/geta/christian.boitet ------------------------------------------------------------------------- Projet C-STAR (http://www.c-star.org/) et projet europ?en Nespole (http://nespole.itc.it) de traduction de parole Projet UNL de communication et recherche d'information multilingue sur le r?seau http://www.unl.ias.unu.edu ou http://www.unl.org ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:08:18 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:08:18 +0200 Subject: Appel: Linguistic Exploration Workshop Message-ID: From: Steven Bird First Announcement: LINGUISTIC EXPLORATION WORKSHOP 12-15 December 2000 Institute for Research in Cognitive Science University of Pennsylvania, Philadelphia Organized by Steven Bird (U Penn) and Gary Simons (SIL) http://www.ldc.upenn.edu/exploration/ Linguistic Exploration is a theme which unites linguists and computational linguists who are engaged in empirical research on large datasets through the combination of traditional field methods with new technologies for representing, investigating and disseminating linguistic data. The languages under study may range from the undescribed to the well-studied, and the "fieldworker" may operate in a village or a laboratory. The focus is language documentation, coupled with an exploratory mode of research where elicitation, analysis and hypothesis-testing form a tight loop. At the January LSA in Chicago, a one day workshop was held on computational infrastructure for linguistic fieldwork. Full materials from this workshop, including abstracts, presentations and audio recordings, are online at http://www.ldc.upenn.edu/exploration/LSA/. A second workshop will be held in Philadelphia in December 2000. The goal of this workshop is to align the many parallel efforts in this area, and to establish a research agenda which will provide the infrastructure for a new generation of computational tools. Please bookmark http://www.ldc.upenn.edu/exploration/ and join the mailing list to be sure of receiving future announcements. -- Steven Bird sb at ldc.upenn.edu http://www.ldc.upenn.edu/sb ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:08:19 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:08:19 +0200 Subject: Conf: INLG'2000 Message-ID: From: "International Natural Language Generation Conference-Dr. Elhadad" International Natural Language Generation INLG'2000 Mitzpe Ramon, Israel Workshops: 12 June 2000 Main conference: 13-16 June 2000 First Call For Participation The First International Natural Language Generation Conference (INLG'2000) will be held June 12 to 16, 2000 in Mitzpe Ramon, Israel. This conference continues in the tradition of the nine biennial workshops on natural language generation that have been held from 1980 to 1998. INLG'2000 will offer the opportunity to a larger audience to participate in the main meeting of researchers in the field. Following the tradition of previous INLG meetings, the conference will be held in an isolated and stunning natural environment: the Ramon Inn hotel, in Mitzpe Ramon, Israel. The hotel is located on the edge of the Ramon Crater, in the middle of the Negev Desert. Conference Main topics: * Generation and summarization * Multimodal and multimedia generation * Multilingual generation * Concept to speech, models of intonation * Strategic generation for text and dialogue * Text planning, discourse models, argumentation strategies, content selection and organization * Tactical generation, formalisms and models of grammar, sentence aggregation, lexical choice * Architecture of generators * Knowledge acquisition and resources for generation and summarization * User-customized generation and summarization * Psychological modeling of discourse production * Learning methods for generation * Evaluation methodologies for generation and summarization * Applications of: generation, concept-to-speech, information extraction, information retrieval techniques to summarization, report generation, explanation. The conference is organized in four tracks: 1. Main session 2. Student session 3. Workshops 4. Special session on evaluation in generation Registration form for the main conference is available at our homepage: http://www.cs.bgu.ac.il/~nlg2000 Registration form for the workshops will be available soon, as well as the full program. Registration will be accepted until May 15th. After this date, a late registration fee will be required. ------------------------------------------------------------------------ Programme Committee * Michael Elhadad, Ben Gurion University, Israel (Chair) * Stephan Buseman, DFKI, Germany * Graeme Hirst, University of Toronto, Canada * James Lester, North Carolina State University, USA * Inderjeet Mani, The MITRE Corporation, USA * Kathy McCoy, University of Delaware, USA * David McDonald, Gensym Corp, USA * Dragomir Radev, University of Michigan, USA * Jacques Robin, Federal University of Pernambuco, Brazil * Donia Scott, University of Brighton, UK * Manfred Stede, Technical University, Berlin, Germany * Matthew Stone, Rutgers University, USA * Ingrid Zukerman, Monash University, Australia Student Session * Irene Langkilde, University of South California - ISI * Charles Brendan Callaway, North Carolina State University * James Shaw, Columbia University Special Session on Evaluation * Inderjeet Mani, The MITRE Corporation Equipment Availability Presenters will have available an overhead projector, a slide projector, a data projector (Barco) which will display from laptops, and a VHS (PAL) videocassette recorder. NTSC format may be available; if you anticipate needing NTSC, please note this information in your proposal. Requests for other presentation equipment will be considered by the local organizers; requests for special equipment should be directed to the local organizers no later than May 15, 2000. ------------------------------------------------------------------------ Local Arrangements * Michael Elhadad elhadad at cs.bgu.ac.il * Yael Dahan Netzer yaeln at cs.bgu.ac.il Dept. of Computer Science Ben Gurion University P.O.Box 643 Beer Sheva 84105 Israel ------------------------------------------------------------------------ ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Thu Apr 20 08:07:58 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Thu, 20 Apr 2000 10:07:58 +0200 Subject: Appel: TALN-2000 (Deadline extension) Message-ID: From: TALN2000 ********************************************************************** * ##### ### ### ### * * ##### ## # # # # # # # # # # # * * # # # # ## # # # # # # # # * * # # # # # # # ##### # # # # # # * * # ###### # # # # # # # # # # # * * # # # # # ## # # # # # # # * * # # # ###### # # ####### ### ### ### * * * * * * TALN 2000 * * Traitement Automatique du Langage Naturel * * * * ?cole Polytechnique F?d?rale de Lausanne * * du 16 au 18 octobre 2000 * * * * http://liawww.epfl.ch/taln2000/ * * * ********************************************************************** (version fran?aise ci-dessous) LAST CALL FOR PAPERS DEADLINE EXTENSION TALN 2000 Swiss Federal Institute of Technology Lausanne, Switzerland 16-18 October 2000 CALENDAR submission deadline : May 5th, 2000 notification to authors : June 23th, 2000 final version due (camera-ready): August 4th, 2000 conference: 16-18 October, 2000 Jointly organised by the Swiss Federal Institute of Technology (Lausanne) and the University of Geneva the Seventh Conference on Natural Language Processing (TALN 2000) will be held at the Swiss Federal Institute of Technology, (Lausanne, Switzerland), on October 16-18, 2000. The conference includes paper presentations, invited speakers, tutorials and software demonstrations. The official conference languages are French and English. TALN 2000 is organised in collaboration with ATALA (Association pour le Traitement Automatique des LAngues) and will be held jointly with the young researcher conference RECITAL 2000 (a separate call for papers will follow). TOPICS Papers are invited for 30-minutes talks (including questions) in all areas of NLP, including (but not limited to): lexicon morphology syntax semantics pragmatics discourse parsing generation abstraction/summarisation dialogue translation logical,symbolical and statistical approaches mathematical linguistics TALN 2000 also welcomes contributions in fields for which NLP plays an important role, as long as these contributions emphasise their NLP dimension : speech processing text processing cognition terminology, knowledge acquisition information retrieval documentary research corpus-based linguistics mathematical linguistics management and acquisition of linguistic resources computer assisted learning NLP tools for linguistic modelization TALN 2000, also welcomes submissions focusing on NLP applications that have been implemented, tested and evaluated and emphasising the scientific aspects and conclusions drawn. Software demonstrations can be proposed, either independently or in connection with a paper proposal. Specific sessions for the demos will be scheduled in the program of the conference. The program committee will select 2 papers among the accepted papers for publication (in an extended version) in the journal "Traitement Automatique des Langues" (t.a.l.). For the journal, these papers will have the status "accepted, subject to modifications", the modifications being the formatting according to the style of the journal. SELECTION Authors are invited to submit original, previously unpublished work. Submissions will be reviewed by at least 2 specialists of the domain. Decisions will be based on the following criteria : - importance and originality of the paper - soundness of the scientific and technical content - comparison of the results obtained with other relevant work - clarity - relevance to the topics of the conference Accepted papers will be published in the proceedings of the conference. SUBMISSION PROCEDURE The maximum length for papers is 10 pages, in Times 12 (approx. 3000 words), single spaced, including figures, examples and references. The maximum length for demo proposals is 3 pages. A LaTeX style file and a Word template will be available on the web site of the conference (http://liawww.epfl.ch/taln2000). Electronic submissions must reach the organising committee before May 5th, 2000 to the latest, at the following address: taln2000 at latl.unige.ch If electronic submission is not possible, 3 hard-copies of the paper must reach the organising committee before April 21st 2000, at the following address: Eric Wehrli - TALN 2000 D?pt. de linguistique - LATL Universit? de Gen?ve 2, rue de Candolle CH-1211 Gen?ve 4 Suisse File format for electronic submissions: Authors should send their submission as a file attached to an e-mail, with the subject field "TALN submission" and containing the following information: submission title, first author's name, affiliation, postal address, e-mail address, phone and fax number. The submissions are ANONYMOUS, and should therefore not include the author's name, or any self-reference. One of the following formats MUST be used: - self-contained LaTeX source (including non standard styles) AND PostScript version. - RTF (Word) document AND PostScript or PDF version. All the PostScript versions must be for A4 paper and not US letter. PRACTICAL INFORMATIONS: Practical information will be detailed shortly on the conference WEB page site (http://liawww.epfl.ch/taln2000/). ---------------------------------------------------------------------- DERNIER APPEL ? COMMUNICATIONS EXTENSION DE LA DATE LIMITE DE SOUMISSION CALENDRIER Date limite de soumission : 05 mai 2000 Notification aux auteurs : 23 juin 2000 Version finale (pr?t-?-clicher): 4 ao?t 2000 Conf?rence : 16-18 octobre 2000 Conjointement organis?e par l'?cole Polytechnique F?d?rale de Lausanne et l'Universit? de Gen?ve, la septi?me ?dition de la conf?rence sur le Traitement Automatique des Langues Naturelles (TALN 2000) se tiendra, du 16 au 18 octobre 2000, ? l'?cole Polytechnique F?d?rale de Lausanne, Suisse. La conf?rence comprendra des communications scientifiques, des conf?rences invit?es, des s?ances de d?monstration ainsi que des tutoriels. Les langues officielles de la conf?rence sont le fran?ais et l'anglais. TALN 2000 est organis?e sous l'?gide de l'ATALA (Association pour le Traitement Automatique des LAngues) et se tiendra conjointement ? la conf?rence pour jeunes chercheurs RECITAL 2000 (appel ? communications a para?tre s?parement). TH?MES Les communications, d'une dur?e de trente minutes, questions comprises, pourront porter sur tous les th?mes habituels du TALN, incluant, de fa?on non limitative: lexique morphologie syntaxe s?mantique pragmatique discours analyse g?n?ration r?sum? dialogue traduction automatique approches logiques, symboliques et statistiques TALN 2000 souhaite ?galement accueillir des travaux de domaines proches dans lesquels le TALN joue un r?le important, dans la mesure o? l'accent est mis sur la composante TALN : traitement de la parole (prosodie, linguistique, pragmatique) traitement de l'?crit aspects cognitifs terminologie, acquisition de connaissances ? partir de textes extraction d'information recherche documentaire linguistique de corpus linguistique math?matique utilisation d'outils de TALN pour la mod?lisation linguistique enseignement assist?, enseignement des langues. Sont aussi attendus des travaux sur des applications du TALN, impl?ment?es et ?valu?es, faisant ressortir leurs aspects scientifiques et les enseignements tir?s. Des d?monstrations de syst?mes pourront ?tre propos?es, ind?pendamment ou en compl?ment d'articles scientifiques. L'emploi du temps de la conf?rence comprendra une session pour ces d?monstrations. Le comit? de programme s?lectionnera parmi les communications accept?es deux articles pour publication (dans une version ?tendue) dans la revue Traitement Automatique des Langues (t.a.l.). Ces articles seront consid?r?s par la revue comme "accept?s sous r?serve de modification", la modification ?tant la mise au format de la revue. CRIT?RES DE S?LECTION Les auteurs sont invit?s ? soumettre des travaux de recherche originaux, n'ayant pas fait l'objet de publications ant?rieures. Les soumissions seront examin?es par au moins deux sp?cialistes du domaine. Seront consid?r?es en particulier : - l'importance et l'originalit? de la contribution, - la correction du contenu scientifique et technique, - la discussion critique des r?sultats, en particulier par rapport aux autres travaux du domaine, - la situation des travaux dans le contexte de la recherche internationale, - l'organisation et la clart? de la pr?sentation, - l'ad?quation aux th?mes de la conf?rence. Les articles s?lectionn?s seront publi?s dans les actes de la conf?rence. MODALIT?S DE SOUMISSION Les articles soumis ne devront pas d?passer 10 pages en Times 12, espacement simple, soit environ 3000 mots, figures, exemples et r?f?rences compris. Les propositions de d?monstrations ne devront pas d?passer 3 pages. Une feuille de style LaTeX et un mod?le Word seront disponibles sur le site web de la conf?rence (http://liawww.epfl.ch/taln2000/). Les articles devront parvenir au comit? d'organisation avant le 05 mai 2000, sous forme ?lectronique, ? l'adresse suivante : taln2000 at latl.unige.ch En cas d'impossibilit? d'envoi par courrier ?lectronique, une soumission "papier" pourra ?tre admise. 3 exemplaires papier de la contribution devront ?tre envoy?s ? l'adresse suivante: Eric Wehrli - TALN 2000 D?pt. de linguistique - LATL Universit? de Gen?ve 2, rue de Candolle CH-1211 Gen?ve 4 Suisse Format pour les soumissions : Les auteurs devront envoyer leur soumission sous la forme d'un document attach? ? un courrier ?lectronique ayant pour titre "TALN submission" et contenant le titre de la communication, le nom, l'affiliation, l'adresse postale, l'adresse ?lectronique, le num?ro de t?l?phone et le fax de l'auteur principal. Les soumissions devront ?tre anonymes et ne devront donc comporter aucun nom d'auteur ni auto-citation. L'un des formats suivants devra IMP?RATIVEMENT ?tre employ?: - source LaTeX auto-suffisant (les styles non standards ou diff?rents de ceux fournis pour TALN 2000 devront ?tre inclus dans le fichier source) ET PostScript - RTF (Word) ET PostScript ou PDF Les versions PostScript devront ?tre au format A4 et non Lettre US. INFORMATIONS PRATIQUES Les informations pratiques seront pr?cis?es ult?rieurement, notamment sur le site web de la conf?rence (http://liawww.epfl.ch/taln2000/) ---------------------------------------------------------------------- COMIT? D'ORGANISATION/ORGANIZING COMMITTEE Eric Wehrli (Pr?sident/President) Martin Rajman Cristian Ciressan Jean-C?dric Chappelier Marie Decrauzat Paola Merlo Christopher Laenzlinger COMIT? DE PROGRAMME/PROGRAM COMMITTEE Pascal Amsili, TALaNa (Paris) Susan Armstrong, ISSCO (Gen?ve) Nicholas Asher, University of Texas (Austin) Afzal Ballim, EPFL (Lausanne) Philippe Blache, LPL (Aix-en-Provence) Christian Boitet, CLIPS-GETA (Grenoble) Pierrette Bouillon, ISSCO (Gen?ve) Didier Bourigault (CNRS, Paris) Jean-Pierre Chanod, XEROX Research Center (Grenoble) C?dric Chappelier, EPFL (Lausanne) B?atrice Daille, IRIN (Nantes) Dominique Estival, University of Melbourne Claire Gardent, Universit?t des Saarlandes (Sarrbr?cken) Damien Genthial, CLIPS-IMAG (Grenoble) Gregory Grefenstette (XEROX) Michael Hess, Uni Zurich Pierre Isabelle, XEROX Research Center (Meylan) Daniel Kayser, LIPN (Paris) Geert-Jan Kruijff, Univerzita Karlova (Praha) Eric Laporte, CERIL, Universit? de Marne la Vall?e Paola Merlo, LATL (Gen?ve) Piet Mertens, CCL K.U. Leuven Jacques Moeschler, LATL (Gen?ve) C?cile Paris, CSIRO (Sidney) Jean-Marie Pierrel, LORIA (Nancy) Alain Polgu?re, Universit? de Montr?al Martin Rajman, EPFL (Lausanne) Owen Rambow, ATT Labs-Research G?rard Sabah, LIMSI (Paris) Jacques Savoy, Uni Neuchatel Jacques Vergne, GREYC (Caen) Jean V?ronis, LPL (Aix-en-Provence) Eric Wehrli, LATL (Gen?ve) Francois Yvon, ENST (Paris) Brigitte Zellner Keller (UNIL, Lausanne) Pierre Zweigenbaum, DIAM (Paris) ********************************************************************** * Contact: * * * * Eric Wehrli - TALN 2000 * * D?pt. de linguistique - LATL * * Universit? de Gen?ve * * 2, rue de Candolle * * CH-1211 Gen?ve 4 * * Switzerland * * * * Tel: +41-22-705.73.63 * * Fax: +41-22-705.79.31 * * email: taln2000 at latl.unige.ch * * * * http://liawww.epfl.ch/taln2000/ * ********************************************************************** -- Pour le comit? d'organisation de TALN 2000, For the organising committee of TALN 2000, J.-C. Chappelier ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:54:03 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:54:03 +0200 Subject: Appel: RECITAL 2000 Message-ID: From: Damien Genthial ---------------------------------------------------------------------- Dernier Appel ? communications NOUVELLE DATE LIMITE DE SOUMISSION : 5 mai 2000 R?CITAL-2000 Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues ?cole Polytechnique F?d?rale de Lausanne (Suisse) du 16 au 18 octobre 2000 http://www-clips.imag.fr/RECITAL-2000 RECITAL-2000 at imag.fr conjointement ? TALN-2000 http://liawww.epfl.ch/taln2000 La quatri?me ?dition du colloque RECITAL se tiendra conjointement ? la conf?rence TALN-2000 (Traitement Automatique des Langues Naturelles), ? Lausanne (Suisse) du 16 au 18 octobre 2000. Le colloque RECITAL-2000 donne aux jeunes chercheurs l'occasion de se rencontrer, de pr?senter leurs travaux et de comparer leurs approches. L'exp?rience tr?s r?ussie du couplage RECITAL-TALN en 1999 sera renouvel?e cette ann?e encore et devrait l'?tre syst?matiquement ? l'avenir.Ce couplage permet aux participants d'acqu?rir une vision plus globale sur les avanc?es th?oriques les plus r?centes, ainsi que sur des applications industrielles actuellement d?velopp?es. Il favorise ?galement les ?changes avec les chercheurs confirm?s qui participent ? TALN. * Th?mes indicatifs Probl?matiques classiques : - Analyse et compr?hension de textes - G?n?ration d'?nonc?s en LN - Traduction automatique - Production de r?sum?s - Correction automatique - Dialogue humain/machine Probl?matiques connexes : - Ressources linguistiques (lexiques, dictionnaires ?lectroniques, corpus etc.) - S?mantique lexicale (polys?mie, cat?gorisation, etc.) - S?mantique du temps et de l'espace (repr?sentation et raisonnement) - Logique (argumentation, mod?lisation des intentions et des croyances, etc.) - Architectures d?di?es au TAL (syst?mes multi-agents, r?seaux neuromim?tiques) - Acquisition et apprentissage automatique de ressources ou de connaissances (? partir de corpus, ou de l'interaction humain/machine) - Relations entre TAL et reconnaissance de la parole Cette liste n'est pas exhaustive et l'ad?quation d'une proposition de communication ? la conf?rence sera jug?e par le comit? de programme. * Soumissions Les soumissions (de 6 pages maximum en Times 12) devront ?tre r?dig?es en fran?ais ou en anglais par de jeunes chercheurs (th?se en cours ou bien soutenue apr?s septembre 1999) et accompagn?es d'un r?sum? de 200 mots. RECITAL-2000 est la session pour ?tudiants de TALN-2000 et il est possible pour un auteur de soumettre ? plusieurs manifestations (TAL, RECITAL, ateliers). Dans ce cas, la double soumission doit appara?tre clairement dans le courrier adress? aux deux comit?s de programme. Les soumissions ? RECITAL-2000 ne seront pas anonymes. Les relecteurs seront encourag?s ? signer leurs rapports. Les soumissions seront exclusivement ?lectroniques et devront ?tre envoy?es en fichier attach? ? : RECITAL-2000 at imag.fr. Les auteurs devront envoyer leur soumission en PostScript, en RTF ou en PDF. Une feuille de style LaTeX et un mod?le Word sont disponibles sur les pages web de TALN-2000 (voir ci-dessus). Les articles retenus seront pr?sent?s lors des sessions RECITAL sous la forme d'une communication orale d'une vingtaine de minutes et figureront dans les actes de TALN-2000. * Dates importantes (calendrier identique ? celui de TALN-2000) Date limite de soumission : 5 mai 2000 Notification aux auteurs : 23 juin 2000 Version finale (pr?t-?-clicher): 4 ao?t 2000 Conf?rence : 16-18 octobre 2000 * Comit? de programme Responsable : Damien Genthial (mailto:Damien.Genthial at imag.fr) Pascal Amsili (TALANA, Paris) Pierre Beust (GREYC, Caen) Jean Caelen (CLIPS, Grenoble) Paul Del?glise (LIUM, Le Mans) C?cile Fabre (ERSS, Toulouse) Bertrand Gaiffe (LORIA, Nancy) Emmanuel Giguet (GREYC, Caen) St?phane Ferrari (GREYC, Caen) Brigitte Grau (LIMSI, Orsay) Maurice Gross (LADL, Paris) Jean-Luc Husson (LORIA, Nancy Eric Laporte (Marne la Vall?e) J?rome Lehuen (LIUM, Le Mans) G?rard Ligozat (LIMSI, Orsay) Daniel Luzzati (LIUM, Le Mans) Denis Maurel (LI, Tours) Reza Mir-Samii (LIUM, Le Mans) Jacques Moeschler (Gen?ve) Philippe Muller (IRIT, Toulouse) Anne Nicolle (GREYC, Caen) Didier Pernel (L&H, Belgique) C?cile Fabre (ERSS, Toulouse) Violaine Prince (Paris 8) Martin Rajman (EPFL, Lausanne) Laurent Romary (LORIA, Nancy) Azim Roussanaly (LORIA, Nancy) G?rard Sabah (LIMSI, Orsay) Patrick Saint-Dizier (IRIT, Toulouse) Jean Senellart (LADL, Paris) Jacques Siroux (IRISA LLI/CORDIAL, Rennes) Max Silberztein (LADL, Paris) Jacques Vergne (GREYC, Caen) Jean V?ronis (LPL, Aix) Anne Vilnat (LIMSI, Orsay) Michael Zock (LIMSI, Orsay) * Comit? d'organisation Responsable : Jos? Rouillard (CLIPS-IMAG Grenoble) Pierre Beust (GREYC Caen) Peggy Cadel (LILLA Nice) Jean Caelen (CLIPS-IMAG Grenoble) Damien Genthial (CLIPS-IMAG Grenoble) St?phanie Pouchot (CRISTAL-GRESEC Grenoble) Dominique Vaufreydaz (CLIPS-IMAG Grenoble) Pour tous renseignements : Page web de R?CITAL-2000 : http://www-clips.imag.fr/RECITAL-2000 Adresse ?lectronique de R?CITAL-2000 : RECITAL-2000 at imag.fr Adresse courrier: Colloque R?CITAL-2000 ? l'attention de Damien Genthial CLIPS, IMAG-Campus BP 53 38040 Grenoble cedex * Information Pratiques Pour toute information pratique concernant l'h?bergement, l'acc?s au site ? Lausanne, etc, nous vous renvoyons aux pages web de TALN (http://liawww.epfl.ch/taln2000). ---------------------------------------------------------------------------- --- TRILAN/CLIPS/IMAG, (lundis et jeudis) | IUT de Valence (les autres jours) T?l : 04 76 51 49 51 | T?l : 04 75 41 88 00 ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:12 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:12 +0200 Subject: Appel: CULT 2000 Message-ID: From: "cult2k" CULT 2000 CONFERENCE ANNOUNCEMENT AND CALL FOR PAPERS=20 (with apologies if you receive multiple copies) Second International Conference on=20 CORPUS USE AND LEARNING TO TRANSLATE Bertinoro, Italy Friday 3 November and Saturday 4 November 2000 AIMS AND TOPICS CULT 2000 follows up the 2-day International Workshop organized by the = School for Interpreters and Translators of Bologna University in Forl=EC = in November 1997 (http://www.sslmit.unibo.it/cult.htm). The aim of the = Conference is to bring together practitioners and theorists sharing an = interest in the design and use of corpora in translation-related areas, = with special reference to translator and interpreter training. = Contributions in the form of papers, demonstrations and posters are = sought on the following topics: - Translation/Interpreting-specific issues relating to the design, = development and use(s) of corpora=20 - Integrating corpus work into courses for translators/interpreters - Corpus-based language learning/teaching for translators/interpreters - Implications of corpus use with respect to theories of = translation/interpreting - The respective roles of corpora, conventional reference tools, and = other computational translation aids - The World Wide Web as a resource for translation/interpreting=20 - Corpora and terminology - Corpus-based descriptive translation studies in the classroom KEYNOTE SPEAKERS Kirsten Malmkj=E6r (Middlesex University) Tony McEnery (Lancaster University) VENUE Bertinoro is a beautiful little town on the Romagna hills, renowned for = its warm hospitality and its good wine. The University Conference Centre = is set in a recently renovated medieval fortress dominating the town. = The view stretches from the mountains of Tuscany to the Adriatic sea. = You can have a look for yourself at: = http://www.spbo.unibo.it/bertinoro/eindice.html SCIENTIFIC COMMITTEE Guy Aston (University of Bologna) Mona Baker (UMIST - Manchester) Lynne Bowker (University of Ottawa) Jennifer Pearson (Dublin City University) Stig Johansson (University of Oslo) Krista Varantola (University of Tampere) ORGANIZING COMMITTEE Silvia Bernardini=20 Dominic Stewart=20 Federico Zanettin=20 ADDRESS FOR CORRESPONDENCE SSLMIT - CULT 2000 Corso della Repubblica 136 47100 Forli Italy Tel.: +39 0543 450 307/304 Fax: +39 0543 450 306 e-mail: cult2k at sslmit.unibo.it WWW: http://www.sslmit.unibo.it/cult2k/ PROPOSALS Proposals for contributions relating to any of the topics listed above = should reach the organizing committee not later than June 15 complete = with abstracts of about 500 words. All proposals will be reviewed. For = further details about the submission procedure please refer to the = Conference Web page or contact the organizing committee at the addresses = above. ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:24 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:24 +0200 Subject: Conf: NLP-2000 Message-ID: From: "Aristides Vagelatos" ANNOUNCEMENT 2nd International Conference on Natural Language Processing NLP 2000: Filling the gap between theory and practice URL: http://www.cti.gr/nlp2000 2, 3 & 4 June 2000, Conference and Cultural Center University of Patras - Greece ORGANIZED BY: - Computer Technology Institute of Patras - University of Patras Computer Engineering Department (Database Laboratory), Philology Department (Linguistics Section) - University of Athens Informatics Department - University of the Aegean Information & Communication Systems Department OBJECTIVES We feel that this is the most opportune time for a critical view of the achievements both in theory and in practice, and for developing bridges in order to build emerging advanced systems and services that will provide the breadth of information envisaged. The aim is to fill the gap between theory and practice so that developments and needs in theory to take advantage and give insights for new developments in technological methods and applications, and visa-versa. The goal is to bring together people that will attest to the progress of the field and disseminate it to a wider audience. Conference Secretariat: Mrs. Penelope Kontodimou P.O. Box 1421 University of Patras GR - 26 500 Patras - Greece Email: pinelop at cti.gr Tel: (+3061) 960.383 Fax: (+3061) 997.783 Program Committee Chair: Christodoulakis Dimitris (University of Patras), Greece, E-mail: dxri at cti.gr ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:26 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:26 +0200 Subject: Ecole: CLaRK'2000 Message-ID: From: Frank Richter Second announcement: Summer School 2000 in Bulgaria - CLaRK'2000 The Tuebingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK) is inviting applications to a summer school in Sozopol, Bulgaria, this summer. Please note the slight change of dates since the first announcement. *NEW* Dates: August 25th - September 8th 2000 (days of arrival and departure) Place: Resort town of Sozopol (Black Sea), Bulgaria Language: English Participants: Participants should be doctoral students who research the interfaces between computer science, cognitive science, linguistics, mathematics and philosophy. In exceptional cases, postdoctoral researchers as well as outstanding students in the final year of masters level studies who intend to pursue a doctorate will also be considered. The summer school is limited to 25 students. Places are competitively allocated on the basis of the research interests of the participants and the perceived benefits to those interests of attending the summer school. Participants must be proficient in English. Stipends: Via the CLaRK Program, the Volkswagen-Foundation will provide stipends for up to 6 students from the countries of Central and Eastern Europe and 6 further students from Bulgaria. The stipends will be awarded on a competitive basis. The stipends will comprise travel costs (up to DEM 600), and room and board for the duration of the summer school. At the discretion of the CLaRK Program, the stipends may include additional support for travel costs above DEM 600. Costs: Participants who are not sponsored by a CLaRK stipend should anticipate approximately DEM 125 for room and board per day. Costs for transportation to and from the summer school are not included in this estimate. Applications: Applications with a completely filled in registration form (available from www.uni-tuebingen.de/IZ/application.rtf), a curriculum vitae, and a short (maximum three pages) summary of relevant past and present research and education must be submitted to the Office of the International Centre at Tuebingen by 30th April 2000. Applicants should indicate whether they are applying for a CLaRK stipend. The event number that the registration form asks for is 5. CLaRK stipend applications must include a letter of recommendation with their application. Internationales Zentrum fuer Wissenschaftliche Zusammenarbeit Universitaet Tuebingen Keplerstr. 17 D - 72074 Tuebingen Tel.: (0049) 7071 / 29 - 77352 or /29 - 74156 Fax: (0049) 7071 / 29 5989 e-mail: iz at uni-tuebingen.de WWW: www.uni-tuebingen.de/IZ/starte.html Content and Goals Computational linguistics and knowledge representation are two distinct disciplines that share a common concern with what knowledge is, how it is used, and how it is acquired. However, though knowledge representation and computational linguistics clearly address broadly similar research problems, research within each of these fields has hitherto been largely ignorant of research in the other. Moreover, the ignorance the two fields have of each other both fosters and is fostered by a wide gulf between the educations received by students of knowledge representation and students of computational linguistics. The goal of the summer school is to help bridge this gulf by introducing the summer school students to recent developments in the interdisciplinary field of computational linguistics and knowledge representation. The summer school will take the form of courses in various topics. The program provisionally includes courses in computational morphology, corpus linguistics, declarative knowledge representation, natural language semantics, Slavic syntax and psycholinguistics. Preliminary Course Program Erhard Hinrichs, Sandra Kuebler: Computational Tools for Corpus Linguistics Valia Kordoni/Frank Richter: A Comparison of LFG and HPSG Anna Kupsc: Slavic in HPSG Detmar Meurers: Introduction to HPSG Janina Rado: Introduction to Psycholinguistics Kiril Simov/Gergana Popova: Computational Morphology Kiril Simov/Atanas Kiryakov: Declarative Knowledge Representation Kiril Simov/Atanas Kiryakov: WordNets: Principles and Applications A short description of the courses can be found on the CLaRK webpages, http://www.sfs.nphil.uni-tuebingen.de/clark/ The expected guest speakers include Nicola Guarino from the University of Padova, Italy (www.ladseb.pd.cnr.it/infor/people/Guarino.html). Contact for further information: Kiril Ivanov Simov (Sofia): kivs at bgcict.acad.bg Frank Richter (Tuebingen): fr at sfs.nphil.uni-tuebingen.de WWW: http://www.sfs.nphil.uni-tuebingen.de/clark/ ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/ From pb at lpl.univ-aix.fr Tue Apr 25 17:56:28 2000 From: pb at lpl.univ-aix.fr (Philippe Blache) Date: Tue, 25 Apr 2000 19:56:28 +0200 Subject: Projet: Ontology Message-ID: From: Patrick Cassidy April 22, 2000 The following note contains a follow-up to some discussions held at the meeting of the Association for Computational Linguistics (ACL) last year, and is now being brought to the attention of a wider group. This is being sent to a number of different listservers, as well as the membership of the ACL and I apologize for what will inevitably be some duplication. Please send all comments directly to me. Best regards, Pat ============================================= Patrick Cassidy MICRA, Inc. || (908) 561-3416 735 Belvidere Ave. || (908) 668-5252 (if no answer) Plainfield, NJ 07062-2054 || (908) 668-5904 (fax) internet: cassidy at micra.com ============================================= To: Members of the Association for Computational Linguistics and others with an interest in knowledge representation, lexicons, and lexical semantics From: Patrick Cassidy (cassidy at micra.com) Subject: A Request to Participate in a Study of the Utility of a Standard Ontology and Lexicon for Natural Language Understanding (NLU) and database interoperability ============== Background ============== In recent years there has been a great deal of effort in building lexicons, ontologies, and terminologies, both for the purposes of basic research and for practical applications. The advantages of common formats and common content to allow reuse of results between groups has been widely recognized, but the practical funding situation has required in most cases that individual groups focus on relatively narrow aspects of the general problem. Efforts have also been underway for years within and between a number of groups to develop common resources to promote interchange of data and to compare results, and to reference and organize the results of the many groups who have prepared valuable resources. These very valuable projects have helped mitigate the difficulty of preparing and finding useful ontologies and lexical resources. However, there is still little prospect that these multiple projects will lead in the near future to a unified common ontology and lexicon that has sufficient detail and functionality to be adopted by a large number of groups as a reference standard, and which can be used directly without substantial modification for a variety of purposes in research and practical applications. Of special value would be the development of a common defining vocabulary of concepts and associated words and relations that would be sufficient to define all of the specialized concepts and words used in applications. The ability to use a common vocabulary to define the concepts and words in diverse applications will provide a level of interoperability unavailable by any other means, except for one-by-one coordination between projects. The question arises whether it is now possible to build on the large body of existing data and experience, to construct such a reference standard within a tightly coordinated single project. The goal will be to create a database that is as inclusive as possible of all of the results and intuitions resulting from previous research and development efforts, and to include as many as possible of the current practitioners within the project to build this resource. The main problem is that development of a basic but realistically large ontology and lexicon for Computational Linguistics research will require a project to coordinate a group -- probably a consortium of dispersed academic and industrial participants -- of a size that will require substantial funding. Though large by the standards of most NLP research projects, such a coordinated effort would still be modest by comparison with funding for important research tools in other areas of science, such as space probes, particle accelerators, or telescopes. Skepticism about the possibility of congressional funding for such a project is understandable, but there is ample precedent for obtaining special congressional funding of tools for research. What is needed is to show that the costs will be repaid by the usefulness of this database both for research and for construction of advanced applications. At a minimum there should be a survey to identify the potential users of a standard ontology and lexicon. In the eventuality that special congressional funding could not be obtained, this will still be useful to help move toward building common resources by other means. At the annual meeting of the ACL in Maryland in June 1999 I helped organize a "birds-of-a-feather" meeting to discuss whether there is at present a need and an opportunity to build a large but basic ontology and lexicon for use in NLU research and applications. Among the 23 that participated in the discussion, most had expended some effort building lexicons and ontologies for natural language understanding, but some members were present who had not themselves participated directly in such efforts. We spent over an hour discussing mostly the technical question of what kind of ontology could be useful for natural language understanding, and the political questions of whether it would be practical to attempt to get agreement at this time among ontology developers with different views of how to proceed. The view was almost unanimous that such a project should be attempted, though it was recognized as technically and organizationally complex. There was also a large degree of skepticism as to whether we could convince congress to fund such a large project. We had hoped to be able to have a wider discussion among the general membership of the ACL, but as it turned out the general business meeting ran well over its allotted time, and when I raised the issue there was no time for discussion, so a motion was made and passed that I should form a committee to study the question and report back to a future meeting. This note is the first request for participation in such a committee. The question of construction of a reference ontology for Computational Linguistics and for database interoperability has already been discussed over several years within the ANSI T2 ad hoc committee on ontologies. That ad hoc committee is no longer actively meeting, and this note and its suggested formation of a study committee is in part an attempt to fill the void left by discontinuation of those discussions. One of the conclusions of those discussions was that substantially increased funding would be needed for a coordinated effort, in order to move the development of useful ontologies beyond the current stage in which isolated groups each pursues its own ideas, which are generally incompatible with or very difficult to merge with those of other groups. The present note is intended to bring the issues addressed by the T2 committee to a wider group, and to form a committee that can develop objective information that would provide justification for the substantial funding needed for a unified project. As mentioned, the complexity and size of such a project, which would require a tightly coordinated effort with funding substantially larger than a typical CL research project, makes it likely that special funding would have to be obtained directly from congress. To obtain such funding it will be necessary to show that there is a significant group of established researchers who have been active in building lexicons and ontologies, and who believe that building a standard reference is technically feasible at present, and that such a reference would be used widely enough to justify the expense. One can find expressions of such a belief in private conversations and in published papers, as well as in the existence of research efforts to build common lexical and ontological resources. To begin the process of developing a well-organized proposal that can be considered seriously by congress, what is needed is a more formal study to present the findings of a broadly representative group rather than of an individual or single research group. This request for participation in this study is only a first step in developing such a proposal. The specific purposes for organizing this committee and the subjects for discussion are: (1) to determine the general characteristics of an ontology and lexicon that would incorporate as much as possible of the results and insights of those who have already spent many years doing research on lexicons, ontologies, knowledge representation, terminologies, and lexical semantics, and would be broadly useful for both research and applications; and (2) to estimate where and to what extent such a database, if built, would in fact be used. Quantitative data about potential areas of use would be especially valuable, to demonstrate that construction of such a database would be worth the cost. The structure of this committee is open to discussion. I would suggest that anyone with experience in any of the relevant fields should be able to vote on any proposals for which a measurement of opinion is needed, and those individuals wishing to participate as voting members should inform me of that before the end of May. Discussions will be conducted by e-mail (I will forward comments to a list of interested persons), unless someone is willing to set up a listserver for this purpose (perhaps an existing listserver should be used?). Individuals willing to prepare a report of the potential uses of a defining ontology/lexicon in specific areas of research or in applications would receive and summarize copies of any data or suggestions relevant to their area, sent from any interested person. The number of possible summaries is not limited, but will probably be small. Any individual is free to make any comments, and all comments received will be forwarded to anyone wishing to receive them, unless they are specifically intended not for distribution. I do not anticipate that at this stage any degree of agreement could be reached about any details of the structure of a common ontology or lexicon, but some summary could be prepared of the various alternatives that might be suggested. I hope that at the NAACL-2000 meeting in Seattle in the first week of May, some preliminary indication could be obtained about how many individuals would be willing to participate as voting members and/or report writers. I do not have a fixed timetable in mind, but probably three months will be sufficient time for interested parties to determine potential uses and send in comments. The timing of subsequent actions will depend on the wishes of the voting members of the committee. All persons interested in this project in any way should contact me by e-mail (cassidy at micra.com) or telephone (908-561-3416). Suggestions about how to organize an informal study of this type would also be welcome, but need to be sent soon to be useful. It will be worthwhile to include in this study a summary of all ontological and lexical resources currently available, and I hope that some representative of every group that has built any form of ontology, terminology, or other lexical resource, which is now available to the public or might become part of a common reference ontology/lexicon, would send me a brief summary of their projects and a reference to the location of any existing data available publicly. There are already several web sites on which pointers to the locations of such resources are listed, and the owners of those sites and those who have prepared other lists of available resources are encouraged to send a copy of the lists they have already prepared. The complete summary of references to such resources submitted will be published as part of the report of the committee. The data that are most needed to determine potential utility of a reference database will be estimates of how much such a common ontology or lexicon would be used. For this purpose, anyone who would be likely to even try using it should send a note indicating the type of system in which it would be used and how it would be used, and how much more efficiently the system might function. I would expect that anyone currently using an ontology or semantic network would want to try such an ontological lexicon, and if there are those who would not try it, the reasons for this skepticism will probably serve as useful input. One of the important questions to be answered is whether one can estimate potential utility in quantitative terms, and if so, how. The likelihood of the ontology being used in one's own system may be expressed in any way, but at least three levels can be distinguished: (1) those who would be willing to participate in construction of such an ontological lexicon; (2) those who would be likely to adopt a standard ontology or lexicon, if it existed; and (3) those who would try using a standard ontology or lexicon, to test its utility. Descriptions of potential commercial uses would be especially valuable for convincing congress that funding is justified. For example, estimates have been made that electronic commerce over the internet will amount to 425 billion dollars by 2001 (IEEE Intelligent Systems, Jan/Feb 1999 "Let's Go Shopping" by Michael McCandless, pp. 2-4). Labor costs in sales transactions tend to run about 10%, so the costs of executing those transactions would be about 40 billion dollars. If these costs could be reduced by 1% due to efficiencies generated by the use of a standard knowledge representation scheme, those cost savings would amount to 400 million dollars per year. The total cost of the development of such an ontology would then be paid back in less than 6 months. One can make similar estimates for other activities which use advanced computer programs, and find similar likely savings. Thus even a miniscule improvement in the efficiency of computer programming or the use of computer programs would appear to make this project cost-effective. However, estimates of this type will be far more convincing if there are those involved in the development or use of programs which have or should have semantic elements, and who could provide more accurate and objectively-based estimates for specific examples. In the best case, an industrial group who maintains a database that already uses an ontology to enhance its functionality might estimate, for example, that an ontology of the type described would likely improve the efficiency of the program by, say, 5%. This number, multiplied by annual sales of the program, could provide a crude estimate of economic benefit. There are several obvious difficulties in making such estimates, starting with the fact that we don't know what the final database will look like. But even very crude estimates from people familiar with a potential use will be better than wild guesses from those with little familiarity. Groups which have already built an ontology or a semantic lexicon can review the costs of development of their own system and determine, if a common ontology would be useful, the direct cost savings that would occur in adopting a standard ontology rather than constructing an enhanced version of their own system. Even without an economic justification of that type, building this database should be justifiable even if it is used primarily as a research tool. Accordingly, I hope that we can obtain comments from all individuals who would be likely to use such a tool in their research or in building applications, as well as those who wish to comment on the desirable structure of such a database. I plan to organize a birds-of-a-feather meeting at the upcoming NAACL-2000 conference in Seattle (April 29-May 3) where those who are willing to consider serving on this committee can meet, and discuss questions of form and substance of a study such as this, as well as any comments that have been received at that point. Accordingly, responses should be sent to me by e-mail if possible before the 27th of April, or they can be presented and discussed at the meeting in Seattle. This study will continue for at least three months, so comments will be welcome and are likely to be valuable after the meeting as well. In the discussions I had concerning this topic with other attendees at the 1999 ACL meeting, the first question was of course what type of ontology is being proposed. The general structure as well as detailed technical questions can only be resolved in the course of preliminary discussions among those who will participate in the construction of the database, as well as in the construction phase. But for the sake of discussion, I have described below some characteristics that will likely need to be included in such a database. The final form of the ontology, if it is to be useful for Computational Linguistics, will have to include substantial lexical knowledge, or will have to be tightly integrated with lexicons built separately. Rather than call it an "ontology" it might better be referred to as an "ontological lexicon," although there should be a core conceptual component in the ontology which will be language- neutral. One of the purposes of formation of this committee is to obtain a wider range of comments concerning desiderata for the structure of such a database. In addition to questions about how such an ontological lexicon would be structured, many at the ACL meeting had other questions. I have reproduced below most of the questions that were asked, and indicated some potential answers. It may well be that nothing suggested here will ultimately find itself accepted unchanged in the final result of construction of this database, but the important issue is that construction of some such a database will be essential to provide a common tool that will permit more effective widespread collaboration in research toward human-level understanding and generation of language. ======================================== What Kind of Ontology is Being Proposed? ======================================== What is being discussed here is the need for a database having two main components: (1) an upper ontology of fundamental concepts, represented in logical format, which are sufficient to serve as the building blocks for construction of all of the more complex concepts that are used in any given field; and (2) a basic lexicon of defining words, in which the word meanings are represented using the same set of fundamental concepts, and which are sufficient to define all of the words of the language. Each word in the lexicon will also have an associated definition using the defining vocabulary, which will in some cases look like an ordinary dictionary definition. Over time, both the ontology and lexicon can be expanded to include more specialized or less common concepts, but the main goal for the initial phase should be to specify the minimum set of defining concepts, semantic relations, and axioms for the ontology, and the minimum set of defining words for the associated lexicon. This description evades some controversial issues regarding what constitutes "words" and "definitions". It is understood that many polysemous words have vague or plastic meanings, dependent on context, and for such words an exhaustive list of meanings cannot be specified; and many words cannot be defined by necessary and sufficient conditions. What can only be recorded in a database of this kind are the necessary characteristics of word meanings, and perhaps some markers indicating when variations in meaning can be expected in linguistic usage. This will be an attempt to record as much as can be agreed on about basic words and concepts at the present state of the field. Applications that need to handle ill-defined words will need additional structure beyond what can be included in a standardized lexicon. The conceptual component of this database would be equivalent to an "upper ontology" or "top ontology" (although this term is used by different people to indicate ontologies of somewhat different sizes). Specifying the meanings of words using a basic ontology of this type constitutes in effect a theory of the meanings of the words. A realistic lexicon will need to include not only single words, but fixed collocations and probably also word combinations that are not normally considered idioms but have some non-compositional character. The lexicon can include not only the word meanings in logical format, but any other data associated with word meaning or usage which is useful for applications. For example, in addition to part-of-speech or etymological data, the lexicon could include verb case frames which would be duplicative to some extent of data in the verb definitions, but in a different format, perhaps easier to use for some purposes. Statistical data on word associations would be another useful component. Though not essential, it could be easily included when available. Specifics of what will be included and how the data will be structured can only be decided by those participating in the construction of the database; the remaining comments in this section are personal suggestions, which may not be adopted by the project participants. The conceptual elements in the ontology will be defined in a logical format, but there are two principles which could make the database more widely acceptable and easier to use: (1) concepts which are not lexicalized in any language as single words or fixed collocations can be included in the ontology, but should be used only where there is some cogent need; and all concepts in the ontology will have an associated definition in some language (usually English). (2) Ideally there will be a "definition parser" that can take such a defining string and produce the logical structure that it is intended to define. The emphasis in this project is on the most general words and concepts, so that a common defining vocabulary of concepts can be developed which, if used for defining terms in specific applications, will allow some significant level of conceptual communication between applications developed by independent groups. Applications that process complex information but are not required to understand linguistic phrases, such as database applications or electronic commerce, can use the ontology, and in theory could ignore the lexicon. Linguistic applications would use the lexicon, and, if any level of conceptual understanding is required, would also use the word definitions in logical format, which will usually also require the use of the basic ontology. (In some cases a linguistic application may use the lexicon and associated definitions with minimal reasoning, and the lexicon would function in such cases as a thesaurus or simple semantic network, such as WordNet). Different ontologies have already been developed by a number of different groups for various purposes, but in general their structures are so different that transferring information from one system to another is very time-consuming or error-prone. The difference between this ontological theory and others which have been proposed thus far lies mostly in the size of the database and the extent to which it will both include and represent a consensus of the different theories (i.e., ontologies and lexical semantic representations) that have been developed thus far by independent groups. What would be very useful for both research and applications development is to have at least one well-developed defining vocabulary freely available to all potential users, constructed by representatives of most or all of the existing ontology and lexicon groups and containing as much as possible of the compatible information which each of these groups could contribute to a common effort. In addition to the core database, user interfaces and applications programming interfaces should be developed, as an integral part of the project, to make the database as easy as possible to learn and use. The representations of the concepts, and through them the meanings of words, will need to be specified ultimately at a logical level that will allow automatic reasoning. The existing Knowledge Interchange Format (KIF) and Conceptual Graphs (CG) standards could serve as well-defined theory-neutral formats for storing the meaning representations. To be useful for computational linguistics, a considerable amount of lexical information should also be included. This distinguishes the proposed database from that of CYC, which placed primary emphasis on utility in reasoning. Another important distinction is that the database must be public domain or at least freely and easily available over the internet for research, such as is the WordNet system. Without the free availability to any potential research or applications group, developing the necessary agreements between groups may be impossible, and most of the utility will be lost. The ontology that will emerge from such a project will most likely have some variant of the typical structure of a set of entities connected by relations, since this is the basic model of meaning representation which has been universally adopted, though with some significant differences between implementations. The relationships may be thought of as semantic relations or as axioms of the ontology, but it is understood that to be useful for reasoning the semantic relations must be defined with sufficient precision that the logical implications of one entity having a specific relation to another can be calculated unambiguously. Although in many ontologies the hierarchy has receive the most attention, it is equally important that the semantic relations be fully agreed upon and well-defined. The set of basic concepts and semantic relations needed will be those which are necessary and sufficient to provide logical definitions of any of the concepts, and by extension, words, which will be used in applications. In effect, what is needed is to create a dictionary with definitions of the words, and a parallel ontology with the same definitions expressed in a logical format suitable for automatic reasoning. The lexicon that labels the concepts of the ontology should include all of the basic words that are needed to define all of the other words of the language; the "words" of the language must eventually include all collocations which are to any degree non-compositional, that is, whose meanings cannot be deduced as a predictable combination of the meanings of the individual component lexical strings. The lexicon cannot at the initial stage be comprehensive, but it should also contain those common collocations, such as those which are produced by the lexical functions of Mel'cuk, which are either essential for generation of fluent colloquial language, or so commonly used that their inclusion will improve the speed or accuracy of the language understanding process. As a practical matter, to demonstrate the potential uses of such an ontological lexicon and to facilitate development of a user interface that will permit widespread use, there should be a detailed implementation of this basic defining vocabulary to define specialized concepts in at least two different areas. Two that come to mind are, for example, the medical area, where the basic defining vocabulary could be integrated with the UMLS system and its metathesaurus; and the military area, where significant effort has already been expended to apply the CYC ontology. These two are by no coincidence areas of interest to governmental agencies. Integration with other specialized ontologies or lexicons might be proposed and performed by individual groups as part of the project. Enterprise models, manufacturing, electronic commerce or planning ontologies would be additional candidates. The primary motivation for developing a common theory of meaning is to allow a greater degree of re-use of research results in computational linguistics, as well as more direct communication between different implemented systems which have a linguistic or conceptual component. ============================================ Why do we need a common defining vocabulary? ============================================ Any difference between two systems in the internal representation of words or concepts must inevitably lead to some difference in the inferences that the two systems make from the same data. Thus without some common basis for defining the meanings of the different concepts used in different systems, the transfer of knowledge between systems will be impossible, time-consuming, or highly error-prone. The need for a common vocabulary of defining concepts is felt not only in the field of natural language understanding, where communication is the primary goal, but also in other fields of Artificial Intelligence, wherever conceptual information painstakingly entered into one system could be useful in another system. It is clear that in some areas of research in Natural Language, semantic representation of word meanings is less important than in others. Research in speech-to-text conversion, for example, and in parsing methodologies, has progressed without the use of semantics. Statistical methods have also been shown to be useful for some practical purposes, though the extraction of the meanings of texts is beyond the capabilities of such a methodology by itself. It is also true that groups doing research with systems which will not interact at a conceptual level with other systems have a great degree of freedom in choosing representations of meaning which may be suitable for their purposes even if not usable in other systems. We would hope that groups whose research does not immediately require detailed semantic representation of meanings will nevertheless recognize its importance for the progress of research in language understanding, and not raise objections to this project unless the objections address the feasibility of the goal. The developers of an ontological lexicon will be those groups working specifically on methods to represent word meanings, but the need for a common representation of meanings of words and texts is felt directly also by those whose research involves some level of understanding, such as in information extraction, message understanding, word sense disambiguation, text categorization, machine translation, and database interoperability. The difficulties caused by a lack of common conceptual representations impact not only NLU and the database and expert systems that CYC has been applied to; it affects many areas of AI. In a recent issue of the IEEE Intelligent Systems (January/February 2000) several commentators discussed the state of AI and some of those comments reflect this problem indirectly: Nils Nilsson commented that "AI shows all the signs of being in what the late Thomas Kuhn called a pre-paradigmatic, pre-normal- science stage. It has many ardent investigators, arrayed in several camps, each claiming to have the essential approach to intelligence in machines.. . . It might be that intelligence is the kind of multiplex for which no single science or paradigm will ever emerge." Donald Michie stated: "The most notable nontrend [in AI] has resulted from consistent disregard of the closing section, Learning Machines, of Turing's 1950 paper. A two-stage approach is there proposed: 1. Construct a teachable machine. 2. Subject it to a course of education. Far from incorporating Turing's incremental principle, even the most intelligent of today's knowledge-acquisition systems forget almost everything they ever learned every time their AI masters turn to the next small corner of this large world." A common basis for representation of knowledge will help to overcome these problems, and help to move more toward the normal scientific paradigm, enabling more rapid advances by allowing investigators to investigate the same phenomenon and compare details of results more directly. In computational linguistics research, having at least one common detailed theory of word meanings for the defining vocabulary will provide a powerful tool for progress toward the ultimate goal of human-level language understanding. =============================================================== Wouldn't it be better to develop a common ontology cumulatively by contributions from existing research groups rather than try to build a larger unified project? =============================================================== The construction of an ontological lexicon for natural language understanding is different in several important ways from most areas of scientific research, where ideas and results from small independent groups provide the bulk of the individual contributions to evaluate or elaborate the theories of each field. The predominance of original contributions from small groups is true in most areas of natural language research as well, but for construction of a large ontology and lexicon for use as a tool in research, the usual research process less effective. The main problem is the size and complexity of a realistic ontology, and the intimate and multiple interrelations of its component parts. Specifying the meanings of the defining vocabulary is to build a fundamental ontology of concepts and then to construct a theory of the meanings of words using those concepts. This endeavor has more of the character of an engineering project than of a research project, in that it is the construction of an artifact which has many complex interacting parts. It may be in theory possible to achieve the same result eventually through small independent contributions of ideas and elements, but such a process is likely to be much slower than a coordinated project, and will be less likely to achieve the goal of a widely accepted reference sta`ndard within any foreseeable time frame. In addition, the time lost in pursuing the development of a common ontology through uncoordinated effort may well prove eventually much more expensive, through the lower efficiency both of research and of implemented programs developed in the interim, than would the development of the same database by a single adequately funded coordinated effort. Furthermore, the problems of coordination of groups with different approaches to ontology development, admittedly difficult even in a single properly funded project, might well be insurmountable without the impetus of deadlines for agreement on specific subproblems within an overall plan of development. One possible alternative is the elaboration of an existing ontology, such as the WordNet, by the cumulative addition of new functions or data. This will, one may hope, proceed in any case until a coordinated project is funded. But in order to accumulate into a unified system, there would still need to be a prime coordinator - in this case presumably the WordNet group. Their own views would then necessarily predominate, and since these have been driven by specific goals and objectives, which are different from the goals of other groups, the resulting database would not represent the best common approach to the varied problems, as would a project initiated de novo for the specific purpose of answering a wide range of research and practical goals. It is also difficult to imagine that the total cost of proceeding in that fashion would in the end be any less than a single coordinated project, which would also contain input from WordNet as well as from other existing systems. The worst-case scenario is one in which several commercial concerns develop proprietary versions of a natural-language ontology, of which the largest part is not publicly available. That is currently the case with the CYC project, and it appears to be the direction in which Microsoft's "MindNet" project is heading. If such a situation develops, there will not be one but several competing "standards", none of which will be easily available to researchers, and even if available to some degree, will not be able to be enhanced and redistributed by most of those who could improve such a system. Such systems will not serve the purpose of providing a common test bed in which new ideas for representing word meanings can be tried by many research groups in realistically large systems, with results distributed to the research community at large. Proprietary systems are also likely to be less reliable than a public one and their behavior unpredictable to anyone outside the development group. ================================================================= Would non-U.S. groups be eligible to participate in this project? ================================================================= Much important work on ontologies has been performed outside of the U.S., and I would expect that participation by non- U.S. groups would be welcomed, indeed would be essential if the resulting ontology, which should be language-neutral, is intended to serve as a standard throughout the scientific community. Since the emphasis would be on creating a defining vocabulary of general concepts sufficient to define all specialized concepts, the experience of those whose native language is other than English will be particularly valuable to recognize when useful basic concepts are lexicalized in one language and not in others. There are already several European projects which are aimed at the construction of common ontological and lexical resources, and it would be great loss if those groups did not participate in an inclusive effort. The language-specific elements of the lexicon will of necessity concentrate first on English, since creating a computational lexicon even of one language is already a very large task. Groups from the UK could of course work on the English lexicon. But if at all possible, groups with experience in automatic translation or other multilingual applications should be requested to participate, since some of the more subtle and difficult problems in knowledge representation may be highlighted by the difficulties found in accurate translation. It is difficult to predict to what extent the inclusion of lexicons for other languages will be feasible; groups which presently concentrate on translation will presumably want to include their parallel lexicons for languages other than English. Ideally, the European research funding agencies might fund European groups willing to coordinate their work with this project, who could concentrate on non-English languages. ================================================================ My notions of how to represent concepts changes every few weeks. How can we fix on a single representation at this time? Do we know enough at present to justify a major project? ================================================================ It goes without saying that an ontological lexicon, like the language it represents, will change over time, but a legitimate question is at what point it is appropriate to undertake a first effort to construct a standard tool that can be used and tested by the entire research community. There have not been any major fundamental changes in the prevailing entity-relationship paradigm for representing knowledge over the past ten years, and the paradigm has been sufficiently well investigated at a fundamental level that there seems to be no reason to delay trying to build a consensus ontological lexicon based on the best knowledge now available. This will provide a research tool that can help to discover the strengths and weaknesses of different aspects of this paradigm, and it can include all the elements deemed important by those who have been studying meaning representation for some time. The database can then be thoroughly and widely tested for conformity to the realities of language use, and for utility in reasoning about data. The main motive for this project is the observation, from prior experience, that the fundamental concepts of any language are so intimately connected with each other that no theory of the meaning of any of its component concepts can be tested in a realistic setting unless some consistent representation of the entire fundamental vocabulary is available. We therefore need some starting point with a realistically large database representing most of the fundamental concepts of a language, in order to make effective tests of whether any specific individual components conform to the way people actually use words and concepts. ================================================================ For how long will the ontology constructed be useful? Isn't it likely to change and need modification or replacement? ================================================================ Based on the lifetimes of existing ontologies, we can expect that a major effort at developing a standard ontology will result in a database that will be useful for research and practical purposes for at least ten years. To avoid getting outdated, the ontological lexicon will need a core group to provide continuing effort at maintenance, at a minimum level of effort possibly five times less intense than for the initial development. It is conceivable that eventually some fundamentally different structure for meaning representation will be proposed and widely accepted, in which case it would be difficult to predict how much of the structure of this proposed ontology would be reusable. But more likely the ontology will continue to be useful for decades by modification, replacement, or addition of new components, with most of the structure remaining stable for years. It is also unlikely that any new meaning representation paradigm could gain wide acceptance unless some substantial effort such as this provides a basis for thorough testing of the entity-relation model on a realistic scale. As a theory of the meaning of words, this database will doubtless be modified and elaborated, as are most scientific theories. Theories in general are tools for organizing research; they provide a framework in which to formulate tests to confirm or refute aspects of the theory. They are useful for a time to make collaborative research on a topic possible, after which they may be modified or abandoned. In a theory with as many individual parts as an upper ontology, we can assume that some parts will be found inadequate for some purposes, while others may remain unmodified for a long time. The core maintenance group, or perhaps a committee with broad representation, would be responsible for making and publicizing the changes in each new revision. Having this theory easily available to the entire research community will maximize the likelihood of finding and addressing inadequacies in its structure. ============================================================= Ontologies have not been shown to be notably useful for NLU. Why spend resources building a bigger one? ============================================================= There is apparently a widespread notion that ontologies, and specifically the CYC ontology, have been tested for utility in Natural Language Understanding and have not proved useful. It is important to address this perception. In fact, attempts to use CYC in natural language have been very modest in terms of time spent, and the main virtue of CYC, its logical structure, has scarcely been tested at all in NLU applications. It is also important to recall that CYC was not designed with use in NLU as a primary objective (as would the ontological lexicon suggested here), although Lenat had expected it would be useful for that purpose. CYC has two other important flaws which would not apply to an ontology built as suggested here -- (1) CYC was built by a single group with a specific viewpoint, and did not include input from many other practitioners of diverse schools of knowledge representation, ontology and lexical semantics. Regardless of its internal consistency, it cannot serve as a focus to bring together a large number of groups to use it as a common reference standard; and (2) most of CYC is not publicly available, and use of CYC presents difficult legal issues. Although it can be useful for specific industrial contractors, its lack of public availability make it unsuitable for use as a research tool; even when made available to academic groups, detailed results of research cannot be freely described, nor modified versions redistributed to other groups. The study that may most directly account for the perception of CYC's inadequacy was performed in 1996 by Nirenburg's group at NMSU ("An assessment of Cyc for Natural Language Processing", MCCS-96-302, available on the Web at: http://crl.nmsu.edu/Research/Pubs/MCCS/Abstracts/mccs-96-302.htm). This study of the utility of CYC for Natural Language research found that several desirable features were absent. It did not, however, suggest that the existing structure could not be used, rather that it needed additional components or structures to be more useful. It did not make any negative conclusions about ontologies generally, and indeed that study group has its own ontology which it finds more directly useful for its purposes. Perhaps of greater relevance is the widespread use of WordNet and EuroWordNet. Although this semantic network does not qualify as a logic-based upper ontology as would the basic ontology which would be constructed as suggested here, it does contain many conceptual relations which would probably be widely accepted as part of the larger ontological lexicon which could be constructed if adequate funding were available. The wide use of WordNet does provide strong evidence that when well-structured and easily usable resources are publicly available, they will prove to be valuable tools for research. This is scarcely surprising, as progress in many types of research is limited by the tools available. Since there has not yet been an ontology constructed with even close to the amount of detail that is needed for understanding of language, it is far too early to draw conclusions as to how Useful a fully-developed and publicly available ontology would be. One of the purposes of developing a comprehensive ontological lexicon would be to discover how useful the present ideas about knowledge representation really are, without the impediments of having multiple small and incompatible sets of data on word meanings. Smaller ontologies have in fact been shown to be useful to some extent in language-understanding tasks, such as disambiguation, but thus far those available have not been shown to dramatically improve performance. Nor should they necessarily. As mentioned, a comprehensive ontology does not by itself constitute a language- understanding system, there are many additional aspects of language understanding systems that must be developed as well. Although an ontology is not the only component of a language understanding system, or even the main one, and its usefulness depends directly on the systems in which it is used, some form of common ontology is a necessary prerequisite for sharing research results in language understanding, wherever the actual meanings of linguistic expressions need to be represented. Many specialized ontologies have been constructed which are not designed to be used in language understanding. But until a common representation of word meanings is used by more than one or two groups, advancement toward human-level understanding of language will be very difficult and is likely to be slow and inefficient. The proposed ontology will be one intended to be useful for NLU as well as for other purposes, such as database interoperability. It will therefore need to be connected intimately with the lexicon, and as much as possible of the type of detailed lexical information that is found in Melcuk's Explanatory-combinatorial dictionary will have to be included. As mentioned above, what is needed is better thought of as an ontological lexicon. ==================================================== Would there be any images or graphical information representation in the ontology? ===================================================== It may be true that some degree of imagery or graphical representation may be required to adequately represent certain concepts or word meanings. Whether it will be feasible to include such data in the first version of an ontological lexicon will have to be decided by those participating in the organization of the effort. It will be helpful if individuals who have worked on graphical information representation were to participate in this study. ============================================================== Different people use different internal ontologies, and to some extent different lexicons. How can we include all of those differences in a single consistent database? ============================================================== In order to serve as a completely accurate medium of communication between agents, the word senses of a language must be identical between speaker and listener, or some degree of miscommunication or ambiguity will result. It happens in human- to-human communication that use of words in different senses by different people causes errors in the communication process. It will also be true that in human-to-computer communication similar differences in internal representation will lead to some miscommunication, though this can be eliminated in computer-to- computer communication. Special procedures for recognizing when variants of meaning are being used will probably have to be part of the implementing systems, and may not be includable in the ontological lexicon itself. Words that are commonly used in variant senses, or have productive polysemous meanings, can be marked as such, and the broadest senses can be included, even though the procedures for recognizing variants of meaning may not be contained within the lexicon. These are the cases where recording collocational use may be especially helpful to disambiguate the sense. It is necessary to build at first a basic lexicon and ontology of words which identifies the most common senses that are used by almost all native speakers of a language, and from that subsequently to build up and include less common or idiosyncratic variants, wherever such variants have some significant level of usage. The differences in their internal lexical representation that people have, if they are sufficiently widespread, may have to be treated similarly to multiple discrete senses of words, or the semantic plasticity of polysemous words. In the real world, of course widely variant use of language can be observed; any idiot or psychotic individual may produce a string of seemingly linguistic utterances that are completely uninterpretable by any other person, however skilled in the language used. The project is intended to produce only a basic reference vocabulary, and the recording of highly individualistic, poetic, and idiosyncratic usage of words will be beyond its scope. Most specialized uses will have to be dealt with by specialized systems built to handle such variation in usage. It is the common defining vocabulary which would be the main concern, though the inclusion of some standardized or common uses of specialized technical words will be valuable, limited only by the time and resources available for extension of the database core. ================================================================= Will funding for construction of such an ontology reduce funding for other areas of Computational Linguistics? ================================================================= In any recommendation made to congress for funding of this project, it must be strongly emphasized that the creation of a standard ontology/lexicon will not substitute for other aspects of computational linguistic research, but is only a tool for such research. The reduction of funding for other aspects of CL research would be counter to the purpose of building the ontology, and would squander the resource that would be built at significant expense. Those who contact funding agencies or members of congress to recommend this project need to be sure to emphasize this point. ====================================================================== Will recommendations by an ACL committee for congressional funding constitute lobbying and jeopardize the tax-exempt status of the ACL? ======================================================================= A study of public issues which includes comments on the need for and effects of government action does not constitute lobbying, and is performed routinely by institutions and think tanks, such as ECRI, without affecting their tax-exempt status. The ACL will not as an institution make recommendations directly to members of congress. Individuals who are interested in the subject may cite an ACL study to support the need for funding. An unfunded and relatively informal study of this type is unlikely by itself to carry sufficient weight to move congress to action, but ideally it could prompt the organization of a more formal study of the need for funding of a standard ontology, for example by the National Academy of Sciences, or by think tanks concerned with technical issues, whose opinions are valued by members of congress. ======================================================================= How can we expect that ontologists and lexical semanticists with different viewpoints could ever be induced to agree on a common approach? ======================================================================== It will indeed likely be difficult to forge agreements on specific issues, but where there is a recognition of the need for compromise, it can be accomplished. Building research resources is in many respects an engineering rather than a research activity, and the mindset required for such a task is quite different from the attitudes which are successful for basic research. One example of this difference was eloquently narrated in Kip Thorne's book "Black Holes and Time Warps" in which he described the analogous difficulty in coordinating several teams, each accustomed to basic theoretical research, in a new effort to design and build an expensive interferometric detector for gravity waves: "Within each team the individual scientists had free rein to invent new ideas and pursue them as they wished for as long as they wished; coordination was very loose. This is just the kind of culture that inventive scientists love and thrive on, the culture that Braginsky craves, a culture in which loners like me are happiest. But it is not a culture capable of designing, constructing, debugging, and operating large, complex scientific instruments like the several- kilometer long interferometers required for success. To design in detail the many complex pieces of such interferometers, to make them all fit together and work together properly, and to keep costs under control and bring the interferometers to completion within a reasonable time requires a different culture: a culture of tight coordination, with subgroups of each team focusing on well-defined tasks and a single director making decisions about what tasks will be done when and by whom. The road from freewheeling independence to tight coordination is a painful one. . . ." He continues that with reluctance, and prodding from the funding agency, the freewheeling and independent scientists made the necessary adjustments. An ontological lexicon for Computational Linguistics is of course a different type of research tool from a gravity-wave detector (and probably of much more immediate practical utility), but the need to build a unified structure which is tightly coordinated and internally consistent may be even greater than that for building physical measuring instruments, because of the likely sensitivity in an ontology to inconsistencies between even widely separated parts. Given the imperative for close coordination in ontology construction, is there a plausible way to achieve the necessary cooperation of groups with disparate viewpoints? I will suggest one possible scenario. If the prospect of organizing development of a standard ontology, as suggested here, reaches the stage where funding looks like a realistic possibility, discussions or a conference should be organized among those who would want to participate in its construction, to determine how many of the disparate systems could be integrated into a single consistent system. In such discussions, the teams will develop some appreciation of the likelihood that their own views may or may not be adopted, intact, or in modified form. Since the most important goal will be to create a database that will be used by the largest number of research teams, at some point disagreements about what formats or approaches to adopt will probably have to be resolved by some form of voting among participating groups, and he project director will need to be able to resolve any issues not amenable to the voting approach. Any group which recognizes that its own approach is incompatible with the majority and is likely not to be adopted, can try to argue for its technical superiority, but if the arguments are not accepted, such a group will face the choice of participating and adapting its own system to the dominant approach, or not participating, and continuing its own independent line of research. There will presumably be some groups interested in exploring novel approaches to knowledge representation that will want to continue along lines different from that adopted by the majority. However, from discussions I have held with people involved in investigation of word meanings, there appears to be a wide recognition of the need for some common database, and many or most are likely to participate in such a project. By the time that project proposals need to be submitted, there should be some preliminary agreement as to the likely outline of the general structure of the database that will be developed. The disagreements over details will need to be resolved in the course of actual funded development, but there will need to be some mechanism, whether by voting of an executive committee or decision of a project chairperson, to resolve residual disagreements by fiat. The manner of selection of the project chairperson would ideally include substantial input from the likely participants in the project. It is likely that to accommodate input from as many as possible of existing groups, the number of persons funded for this project will approach or exceed two hundred over an initial development stage of three to five years. The required funding for a project of that size will be close to two hundred million dollars ($200,000,000) over the five years. This will almost certainly require a special appropriation from congress. Other areas of science, including highly theoretical fields with little immediate practical applications, have succeeded in obtaining funding for projects comparable to and often much larger that this (the *annual* maintenance budget of the Hubble telescope is about $200 million). The possibility of congressional funding is realistic, provided that an adequate justification can be agreed upon among practitioners in the field. That is the purpose of forming this committee, and I hope that all of those who may have some use for an ontological lexicon will respond with information about potential uses that will allow us to demonstrate the cost-effectiveness of such a project. ___________________________________________________________________ Message diffus? par la liste Langage Naturel Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/ English version : http://www.biomath.jussieu.fr/LN/LN/ Archives : http://web-lli.univ-paris13.fr/ln/