From annes at HTDC.ORG Tue Dec 31 20:04:02 1996 From: annes at HTDC.ORG (Anne Sing) Date: Tue, 31 Dec 1996 10:04:02 -1000 Subject: PARSER CHALLENGE Message-ID: A message from Derek Bickerton to the linguistic community and the world at large: It can't be done! How often have we heard that? It's obvious that no-one could devise an effective parser. I mean, isn't it? After all, if it were possible, then Microsoft or some other billion-dollar corporation would have done it already. Wouldn't they? Wrong. You don't solve problems by throwing money at them. You solve them by trying a different way. Six years ago, my former student Phil Bralich and I started to develop a new theory of syntax. Right then we decided that instead of presenting our ideas to the academic community, we would put them to a far sterner test.... THE MARKETPLACE! So we have developed a parser. We don't claim this parser is perfect--yet. All we claim is that it is the best parser presently in existence, that it will outperform any parser at present in existence, and that within a relatively short space of time we will be able to parse all possible types of English sentence. So confident are we of this that we are now ready to show you what we can do on our web site. We are also willing to propose a minimum set of standards that any parser must meet in order to call itself "commercially viable." Don't take our word for it. Try it for yourself in the comfort of your own home. We're giving you the tools to do just that by putting our parser on the web. And if you think you have a better parser, we challenge you to do the same. So put up or shut up! Stakes are high, folks. A fully efficient parser means that the days of human- machine interactivity are here at last. You will be able to talk to a machine and it will talk back to you. Not fake it--make it! I will provide a full analysis of your sentences, and you will be able to ask it questions and it will answer them. Now here's Phil Bralich with the details of the challenge. Go for it! Derek Bickerton Professor Emeritus, University of Hawaii Author of 'Roots of Language', 'Language and Species', 'Language and Human Behavior'. PARSING CHALLENGE Myself (Philip A. Bralich, Ph.D.) and Professor Derek Bickerton of the University of Hawaii (emeritus) have produced a parser from a theory of syntax that we developed over the last few years. We would like to invite readers on this list to try it and send us their comments. This is the fastest and most advanced parser available to date and, for that reason, it should be of value to linguists and programmers alike. While we will not disclose the exact nature of the theory that we created to do his, we can tell you that it is derived from the same basic tenets that syntacticians have been working with for the last 20 years. Further, we would like to challenge other linguists and developers to put other such efforts on a web site so that we can all test the state of the art of this aspect of our field. If some researchers are reluctant to do so because they are worried about giving away proprietary information concerning their work, we would like to remind them that we are only asking them to show "what they can do" NOT "how they do it." That is, any parsers that are "out there" can be safely put on the web to allow others to try and compare them. We are offering this challenge to our contacts and friends both in the academic world and those in the private sector with whom we've developed a relationship over the last few years. Currently, the only web site that we are aware of is at: www.georgetown.edu/compling. This site contains several parsers on it that illustrate the state of the art (oustide of our offices). Other sites discussing parsers and computational linguistics, but which do not offer any parsers to try are: www.dfki.uni.sb.de (Also contains the Natural Langauge Software Registry) www.ai.uga.faculty www.boole.stanford.edu/pub/lingol.html www.sil.org/pcpatr www.cl.cam.ac.uk./ftp/nltools www.ims.uni-stuttgart.de/cuf In order for a parser to be commercially viable or for it to be of value to academics, it must be able to meet a minimum set of criteria in the areas of analysis, evaluation, and manipulation of input strings. For this reason, we would like to propose a set of minimum standards that must be met in order to join this challenge. There have in the past been challenges based on such questionable criteria as "parsing sentences from the New York Times"; however, we propose that until the challenges which follow are met, no one is ready to begin such an endeavor. At this stage of the game, asking a parser to parse the New York Times is a little like asking speech recognition to handle dictation from a debate between 8 speakers from 8 dialect areas before it is asked to do a good job with dictation from one speaker. MINIMUM STANDARDS OF THIS CHALLENGE In addition to using a dictionary that is at least 25,000 words in size and working in real time and handling sentences up to 12 or 14 words in length (the size required for most commercial applications), we suggest that parsers should also meet the following standards before engaging this challenge: At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts of sentence, 3) identify internal clauses, 4) identify sentence type (without using punctuation), and 5) identify tense and voice in main and internal clauses. At a minimum from the point of view of EVALUATION OF STRINGS, the parser should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3) give the number of correct parses identified, 4) identify what sort of items succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the number of unacceptable parses that were tried, and 6) give the exact time of the parse in seconds. At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser should: 1) change questions to statements and statements to questions, 2) change actives to passives in statements and questions and change passives to actives in statements and questions, and 3) change tense in statements and questions. We have several functions in addition to the ones listed above incorporated in our web site, but we believe all of those listed above are necessary in order to begin doing anything useful with a parser such as giving game characters dialoging abilities or improving grammar checkers and translation devices. That is, if you can change active/passive and question/statement, that indicates that you can do the manipulations of strngs that are required to allow question/answer, statement/response repartee with computer programs and on-screen characters. In particular, we offer one function which allows an individual to type in a sentence and then ask questions of it. This is described briefly in the login instructions which follow this message. The parser that we have on the web site takes up less than 55 kilobytes of space and works with under one megabyte of RAM. There are approximately 25,000 lines of code and it took about two man years to bring it to this stage of development. The dictionary requires 2 megabytes of space. Before logging on users might also want to familiarize themselves with the parsers at www.georgetown.edu/compling as well as the abilities of currently available commercial products such as grammar checkers, translation devices, and foreign language tutoring software to get some sense of what is currently available. Keeping in mind, of course, that if anything other that what they find were available, it would have appeared on the market. That is, because of the tremendously competitive nature of this industry, companies release their state of the art products rather than keep them bottled up until they are perfected. To our knowledge their are no commercial organizations that can meet the minimum standards of this challenge. Thus, even if the developers of private sector parsers are unable or unwilling to join in this challenge, we can judge the state of the art by looking at available products. Certainly there is nothing available on the software market today that indicates these minimum standards can be met. Further, to our knowledge there are no academic institutions that can meet the minimum standards of this challenge, even though there are rumored to be dozens of parsers "out there" that are up to the task. This is one of the main reasons we are issuing this challenge. Linguists in and out of academia need a forum by which to judge whether or not extent parsers do or do not measure up to the standards of the state of the art. To simply say there are lots of parsers "out there" without some standard forum for finding and judging them is to abdicate our responsibility as professionals in the field. LOGIN INSTRUCTIONS: The users of this web site are invited to focus on sentences that would be of most value to computer users or software developers. For example, you can type in "Indiana Jones gave the treasure map to the beggar in Madrid," for a talking game application, or "Show me flights from Honolulu to Tokyo," for a travel agency application. You can follow this statement with questions and expect to receive relevant responses. For example, Who gave the treasure map to the beggar Indiana What did Indiana Jones give to the beggar the map Where did Indiana Jones give the treasure map Madrid Who did Indiana Jones give the treasure map the beggar Did Indiana Jones give the beggar a treasure map Yes Did Indiana Jones give the beggar a book No To log on: 1. Go to www.ergo-ling.com 2. Click on "Parser Demo" in the "Restricted Access" section of the web page. 3. Input AT LEAST name and email address or it won't work. 4. Read the instructions and type in sentences. Please forward your comments to this list or to either Derek Bickerton or myself at: derek at Hawaii.edu bralich at Hawaii.edu We are issuing this challenge to provide all linguists the opportunity to evaluate the many parsers that are supposedly "out there" and decide for themselves just what the state of the art is. For those who are unaware of what is entailed in putting a parser on a web site, the actual programming and set-up required should take less than one full week of programming and less than $100. Much of the programming and set-up has undoubtedly been completed by anyone who has a parser that can be taken seriously. Using the programs we have developed, we are currently signing development contracts with game manufacturers and educational software developers. We are also developing ESL software for a large coporation in Japan with whom we are discussing the creation of a similar parser for Japanese. Thus, for those of you who are "out there" with parsers, you may want to join this challenge to help generate further development of this very important area of our field. Finally, we would like to point out that parsers that are not available on the web are as suspect as theories that are unpublished. In the interests of academic responsibility, if parsers are to be taken seriously they should be as publicly available as theories that would be taken seriously i.e. by means of a requirement of some sort of publication. The best medium for this for now is an Internet web site. This allows a developer to demonstrate his parser without compromising propeitary information. Maybe at some point in the future, there will be "refereed web sites" just as there are refereed journals, but for now, we will have to recognize that publicly accessible web sites provide an appropriate forum to decide whether or not a parser does indeed exist, what it can or cannot do, and whether or not it compares favorably with others. This will help us avoid the problems of people claiming there are lots of parsers "out there" without being required to demonstrate this in any way. We will summarize to the list commercial and noncommercial sites as well as responses we get to this challenge. TRY THIS PARSER AND ASK YOURSELF THE QUESTION: "COULD I MAKE COMMERCIALLY VIABLE SOFTWARE WITH THIS TOOL?" Sincerely, Philip A. Bralich, Ph.D. President and CEO P.S. Resumes from those with a background in syntax and in creating Natural Language dictionaries for the languages of Spanish, Russian, Arabic, Japanese and German are being accepted for positions beginning in June. Details will follow in later messages. ERGO LINGUISTIC TECHNOLOGIES Manoa Innovation Center 2800 Woodlawn Drive, Suite 175 Honolulu, Hawaii 96822 TEL: (808) 539-3920 FAX: (808) 539-3924 From annes at HTDC.ORG Tue Dec 31 20:04:02 1996 From: annes at HTDC.ORG (Anne Sing) Date: Tue, 31 Dec 1996 10:04:02 -1000 Subject: PARSER CHALLENGE Message-ID: A message from Derek Bickerton to the linguistic community and the world at large: It can't be done! How often have we heard that? It's obvious that no-one could devise an effective parser. I mean, isn't it? After all, if it were possible, then Microsoft or some other billion-dollar corporation would have done it already. Wouldn't they? Wrong. You don't solve problems by throwing money at them. You solve them by trying a different way. Six years ago, my former student Phil Bralich and I started to develop a new theory of syntax. Right then we decided that instead of presenting our ideas to the academic community, we would put them to a far sterner test.... THE MARKETPLACE! So we have developed a parser. We don't claim this parser is perfect--yet. All we claim is that it is the best parser presently in existence, that it will outperform any parser at present in existence, and that within a relatively short space of time we will be able to parse all possible types of English sentence. So confident are we of this that we are now ready to show you what we can do on our web site. We are also willing to propose a minimum set of standards that any parser must meet in order to call itself "commercially viable." Don't take our word for it. Try it for yourself in the comfort of your own home. We're giving you the tools to do just that by putting our parser on the web. And if you think you have a better parser, we challenge you to do the same. So put up or shut up! Stakes are high, folks. A fully efficient parser means that the days of human- machine interactivity are here at last. You will be able to talk to a machine and it will talk back to you. Not fake it--make it! I will provide a full analysis of your sentences, and you will be able to ask it questions and it will answer them. Now here's Phil Bralich with the details of the challenge. Go for it! Derek Bickerton Professor Emeritus, University of Hawaii Author of 'Roots of Language', 'Language and Species', 'Language and Human Behavior'. PARSING CHALLENGE Myself (Philip A. Bralich, Ph.D.) and Professor Derek Bickerton of the University of Hawaii (emeritus) have produced a parser from a theory of syntax that we developed over the last few years. We would like to invite readers on this list to try it and send us their comments. This is the fastest and most advanced parser available to date and, for that reason, it should be of value to linguists and programmers alike. While we will not disclose the exact nature of the theory that we created to do his, we can tell you that it is derived from the same basic tenets that syntacticians have been working with for the last 20 years. Further, we would like to challenge other linguists and developers to put other such efforts on a web site so that we can all test the state of the art of this aspect of our field. If some researchers are reluctant to do so because they are worried about giving away proprietary information concerning their work, we would like to remind them that we are only asking them to show "what they can do" NOT "how they do it." That is, any parsers that are "out there" can be safely put on the web to allow others to try and compare them. We are offering this challenge to our contacts and friends both in the academic world and those in the private sector with whom we've developed a relationship over the last few years. Currently, the only web site that we are aware of is at: www.georgetown.edu/compling. This site contains several parsers on it that illustrate the state of the art (oustide of our offices). Other sites discussing parsers and computational linguistics, but which do not offer any parsers to try are: www.dfki.uni.sb.de (Also contains the Natural Langauge Software Registry) www.ai.uga.faculty www.boole.stanford.edu/pub/lingol.html www.sil.org/pcpatr www.cl.cam.ac.uk./ftp/nltools www.ims.uni-stuttgart.de/cuf In order for a parser to be commercially viable or for it to be of value to academics, it must be able to meet a minimum set of criteria in the areas of analysis, evaluation, and manipulation of input strings. For this reason, we would like to propose a set of minimum standards that must be met in order to join this challenge. There have in the past been challenges based on such questionable criteria as "parsing sentences from the New York Times"; however, we propose that until the challenges which follow are met, no one is ready to begin such an endeavor. At this stage of the game, asking a parser to parse the New York Times is a little like asking speech recognition to handle dictation from a debate between 8 speakers from 8 dialect areas before it is asked to do a good job with dictation from one speaker. MINIMUM STANDARDS OF THIS CHALLENGE In addition to using a dictionary that is at least 25,000 words in size and working in real time and handling sentences up to 12 or 14 words in length (the size required for most commercial applications), we suggest that parsers should also meet the following standards before engaging this challenge: At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts of sentence, 3) identify internal clauses, 4) identify sentence type (without using punctuation), and 5) identify tense and voice in main and internal clauses. At a minimum from the point of view of EVALUATION OF STRINGS, the parser should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3) give the number of correct parses identified, 4) identify what sort of items succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the number of unacceptable parses that were tried, and 6) give the exact time of the parse in seconds. At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser should: 1) change questions to statements and statements to questions, 2) change actives to passives in statements and questions and change passives to actives in statements and questions, and 3) change tense in statements and questions. We have several functions in addition to the ones listed above incorporated in our web site, but we believe all of those listed above are necessary in order to begin doing anything useful with a parser such as giving game characters dialoging abilities or improving grammar checkers and translation devices. That is, if you can change active/passive and question/statement, that indicates that you can do the manipulations of strngs that are required to allow question/answer, statement/response repartee with computer programs and on-screen characters. In particular, we offer one function which allows an individual to type in a sentence and then ask questions of it. This is described briefly in the login instructions which follow this message. The parser that we have on the web site takes up less than 55 kilobytes of space and works with under one megabyte of RAM. There are approximately 25,000 lines of code and it took about two man years to bring it to this stage of development. The dictionary requires 2 megabytes of space. Before logging on users might also want to familiarize themselves with the parsers at www.georgetown.edu/compling as well as the abilities of currently available commercial products such as grammar checkers, translation devices, and foreign language tutoring software to get some sense of what is currently available. Keeping in mind, of course, that if anything other that what they find were available, it would have appeared on the market. That is, because of the tremendously competitive nature of this industry, companies release their state of the art products rather than keep them bottled up until they are perfected. To our knowledge their are no commercial organizations that can meet the minimum standards of this challenge. Thus, even if the developers of private sector parsers are unable or unwilling to join in this challenge, we can judge the state of the art by looking at available products. Certainly there is nothing available on the software market today that indicates these minimum standards can be met. Further, to our knowledge there are no academic institutions that can meet the minimum standards of this challenge, even though there are rumored to be dozens of parsers "out there" that are up to the task. This is one of the main reasons we are issuing this challenge. Linguists in and out of academia need a forum by which to judge whether or not extent parsers do or do not measure up to the standards of the state of the art. To simply say there are lots of parsers "out there" without some standard forum for finding and judging them is to abdicate our responsibility as professionals in the field. LOGIN INSTRUCTIONS: The users of this web site are invited to focus on sentences that would be of most value to computer users or software developers. For example, you can type in "Indiana Jones gave the treasure map to the beggar in Madrid," for a talking game application, or "Show me flights from Honolulu to Tokyo," for a travel agency application. You can follow this statement with questions and expect to receive relevant responses. For example, Who gave the treasure map to the beggar Indiana What did Indiana Jones give to the beggar the map Where did Indiana Jones give the treasure map Madrid Who did Indiana Jones give the treasure map the beggar Did Indiana Jones give the beggar a treasure map Yes Did Indiana Jones give the beggar a book No To log on: 1. Go to www.ergo-ling.com 2. Click on "Parser Demo" in the "Restricted Access" section of the web page. 3. Input AT LEAST name and email address or it won't work. 4. Read the instructions and type in sentences. Please forward your comments to this list or to either Derek Bickerton or myself at: derek at Hawaii.edu bralich at Hawaii.edu We are issuing this challenge to provide all linguists the opportunity to evaluate the many parsers that are supposedly "out there" and decide for themselves just what the state of the art is. For those who are unaware of what is entailed in putting a parser on a web site, the actual programming and set-up required should take less than one full week of programming and less than $100. Much of the programming and set-up has undoubtedly been completed by anyone who has a parser that can be taken seriously. Using the programs we have developed, we are currently signing development contracts with game manufacturers and educational software developers. We are also developing ESL software for a large coporation in Japan with whom we are discussing the creation of a similar parser for Japanese. Thus, for those of you who are "out there" with parsers, you may want to join this challenge to help generate further development of this very important area of our field. Finally, we would like to point out that parsers that are not available on the web are as suspect as theories that are unpublished. In the interests of academic responsibility, if parsers are to be taken seriously they should be as publicly available as theories that would be taken seriously i.e. by means of a requirement of some sort of publication. The best medium for this for now is an Internet web site. This allows a developer to demonstrate his parser without compromising propeitary information. Maybe at some point in the future, there will be "refereed web sites" just as there are refereed journals, but for now, we will have to recognize that publicly accessible web sites provide an appropriate forum to decide whether or not a parser does indeed exist, what it can or cannot do, and whether or not it compares favorably with others. This will help us avoid the problems of people claiming there are lots of parsers "out there" without being required to demonstrate this in any way. We will summarize to the list commercial and noncommercial sites as well as responses we get to this challenge. TRY THIS PARSER AND ASK YOURSELF THE QUESTION: "COULD I MAKE COMMERCIALLY VIABLE SOFTWARE WITH THIS TOOL?" Sincerely, Philip A. Bralich, Ph.D. President and CEO P.S. Resumes from those with a background in syntax and in creating Natural Language dictionaries for the languages of Spanish, Russian, Arabic, Japanese and German are being accepted for positions beginning in June. Details will follow in later messages. ERGO LINGUISTIC TECHNOLOGIES Manoa Innovation Center 2800 Woodlawn Drive, Suite 175 Honolulu, Hawaii 96822 TEL: (808) 539-3920 FAX: (808) 539-3924