8.65, FYI: Parser Challenge

linguist at linguistlist.org linguist at linguistlist.org
Tue Jan 21 21:15:54 UTC 1997


LINGUIST List:  Vol-8-65. Tue Jan 21 1997. ISSN: 1068-4875.

Subject: 8.65, FYI: Parser Challenge

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>
Technical Editor:  Ron Reck <ron at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Editor for this issue: Helen Dry <hdry at emunix.emich.edu>

=================================Directory=================================

1)
Date:  Tue, 31 Dec 1996 10:04:02 -1000
From:  Anne Sing <annes at htdc.org>
Subject:  PARSER CHALLENGE

-------------------------------- Message 1 -------------------------------

Date:  Tue, 31 Dec 1996 10:04:02 -1000
From:  Anne Sing <annes at htdc.org>
Subject:  PARSER CHALLENGE



A message from Derek Bickerton to the linguistic community and the world at
large:


It can't be done!
How often have we heard that?
It's obvious that no-one could devise an effective parser.  I mean, isn't
it?  After all, if it were possible, then Microsoft or some other
billion-dollar corporation would have done it already.  Wouldn't they?

Wrong.  You don't solve problems by throwing money at them.  You solve them
by trying a different way.  Six years ago, my former student Phil Bralich
and I started to  develop a new theory of syntax.  Right then we decided
that instead of presenting  our ideas to the academic community, we would put
them to a far sterner test....

THE MARKETPLACE!

So we have developed a parser.  We don't claim this parser is perfect--yet.
All we claim is that it is the best parser presently in existence, that it
will outperform any parser at present in existence, and that within a
relatively short space of time we will be able to parse all possible types
of English sentence.  So confident are we of this that we are now ready to
show you what we can do on our web site.  We are also willing to propose a
minimum set of standards that any parser must meet in order to call
itself "commercially viable."

Don't take our word for it.  Try it for yourself in the comfort of your own
home.  We're giving you the tools to do just that by putting our parser on
the web.  And if you think you have a better parser, we challenge you to do
the same.

So put up or shut up!

Stakes are high, folks.  A fully efficient parser means that the days of human-
machine interactivity are here at last.  You will be able to talk to a
machine and it will talk back to you.  Not fake it--make it!  I will provide
a full analysis of your sentences, and you will be able to ask it questions
and it will answer them.

Now here's Phil Bralich with the details of the challenge.

Go for it!

Derek Bickerton
Professor Emeritus, University of Hawaii
Author of  'Roots of Language',  'Language and Species',  'Language and
Human Behavior'.


PARSING CHALLENGE


Myself (Philip A. Bralich, Ph.D.) and Professor Derek Bickerton of the
University of Hawaii (emeritus) have produced a parser from a theory of
syntax that we developed over the last few years.  We would like to invite
readers on this list to try it and send us their comments.  This is the
fastest and most advanced parser available to date and, for that reason, it
should be of value to linguists and programmers alike. While we will not
disclose the exact nature of the theory that we created to do his, we can
tell you that it is derived from the same basic tenets that syntacticians
have been working with for the last 20 years.

Further, we would like to challenge other linguists and developers to put
other such efforts on a web site so that we can all test the state of the
art of this aspect of our field.  If some researchers are reluctant to do so
because they are worried about giving away proprietary information
concerning their work, we would like to remind them that we are only asking
them to show "what they can do" NOT "how they do it."  That is, any parsers
that are "out there" can be safely put on the web to allow others to try and
compare them.  We are offering this challenge to our contacts and friends
both in the academic world and those in the private sector with whom we've
developed a relationship over the last few years.

Currently, the only web site that we are aware of is at:

www.georgetown.edu/compling.

This site contains several parsers on it that illustrate the state of the art
(oustide of our offices). Other sites discussing parsers and
computational linguistics, but which do not offer any parsers to try are:

www.dfki.uni.sb.de  (Also contains the Natural Langauge Software Registry)
www.ai.uga.faculty
www.boole.stanford.edu/pub/lingol.html
www.sil.org/pcpatr
www.cl.cam.ac.uk./ftp/nltools
www.ims.uni-stuttgart.de/cuf

In order for a parser to be commercially viable or for it to be of value to
academics, it must be able to meet a minimum set of criteria in the areas of
analysis, evaluation, and manipulation of input strings.  For this reason,
we would like to propose a set of minimum standards that must be met in
order to join this challenge.  There have in the past been challenges based
on such questionable criteria as "parsing sentences from the New York
Times"; however, we propose that until the challenges which follow are met,
no one is ready to begin such an endeavor.  At this stage of the game,
asking a parser to parse the New York Times is a little like asking speech
recognition to handle dictation from a debate between 8 speakers from 8
dialect areas before it is asked to do a good job with dictation from one
speaker.

MINIMUM STANDARDS OF THIS CHALLENGE
In addition to using a dictionary that is at least 25,000 words in size and
working in real time and handling sentences up to 12 or 14 words in length
(the size required for most commercial applications), we suggest that parsers
should also meet the following standards before engaging this challenge:

At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS,
the  parser should:, 1) identify parts of speech, 2) identify parts of
sentence, 3) identify internal clauses, 4) identify sentence type (without
using punctuation), and 5) identify tense and voice in main and internal
clauses.

At a minimum from the point of view of EVALUATION OF STRINGS, the parser
should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3)
give the number of correct parses identified, 4) identify what sort of items
succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give
the number of unacceptable parses that were tried, and 6) give the exact
time of the parse in seconds.

At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser
should: 1) change questions to statements and statements to questions, 2)
change actives to passives in statements and questions and change passives
to actives in statements and questions, and 3) change tense in statements
and questions.

We have several functions in addition to the ones listed above incorporated
in our web site, but we believe all of those listed above are necessary in
order to begin doing anything useful with a parser such as giving game
characters dialoging abilities or improving grammar checkers and translation
devices. That is, if you can change active/passive and question/statement,
that indicates that you can do the manipulations of strngs that are required
to allow question/answer, statement/response repartee with computer programs
and on-screen characters.

In particular, we offer one function which allows an individual to
type in a sentence and then ask questions of it.  This is described briefly
in the login instructions which follow this message.

The parser that we have on the web site takes up less than 55 kilobytes
of space and works with under one megabyte of RAM.  There are approximately
25,000 lines of code and it took about two man years to bring it to this
stage of development.  The dictionary requires 2 megabytes of space.

Before logging on users might also want to familiarize themselves with the
parsers at www.georgetown.edu/compling as well as the abilities of currently
available commercial products such as grammar checkers, translation devices,
and foreign language tutoring software to get some sense of what is currently
available. Keeping in mind, of course, that if anything other that what they
find were available, it would have appeared on the market.  That is, because
of the tremendously competitive nature of this industry, companies release
their state of the art products rather than keep them bottled up until they
are perfected.

To our knowledge their are no commercial organizations that can meet the
minimum standards of this challenge.  Thus, even if the developers of
private sector parsers are unable or unwilling to join in this challenge, we
can judge the state of the art by looking at available products.  Certainly
there is nothing available on the software market today that indicates these
minimum standards can be met.  Further, to our knowledge there are no
academic institutions that can meet the minimum standards of this challenge,
even though there are rumored to be dozens of parsers "out there" that are
up to the task.  This is one of the main reasons we are issuing
this challenge.  Linguists in and out of academia need a forum by which to
judge whether or not extent parsers do or do not measure up to the standards
of the state of the art.  To simply say there are lots of parsers "out
there" without some standard forum for finding and judging them is to
abdicate our responsibility as professionals in the field.

LOGIN INSTRUCTIONS:
The users of this web site are invited to focus on sentences that would be
of most value to computer users or software developers.  For example, you
can type in "Indiana Jones gave the treasure map to the beggar in Madrid,"
for a talking game application, or "Show me flights from Honolulu to Tokyo,"
for a travel agency application. You can follow this statement with questions
and expect to receive relevant responses. For example,

        Who gave the treasure map to the beggar             Indiana
        What did Indiana Jones give to the beggar           the map
        Where did Indiana Jones give the treasure map       Madrid
        Who did Indiana Jones give the treasure map         the beggar
        Did Indiana Jones give the beggar a treasure map    Yes
        Did Indiana Jones give the beggar a book            No

To log on:
1.      Go to www.ergo-ling.com
2.      Click on "Parser Demo" in the "Restricted Access" section of the web
        page.
3.      Input AT LEAST name and email address or it won't work.
4.      Read the instructions and type in sentences.


Please forward your comments to this list or to either Derek Bickerton or
myself at:

derek at Hawaii.edu

bralich at Hawaii.edu

We are issuing this challenge to provide all linguists the opportunity to
evaluate the many parsers that are supposedly "out there" and decide for
themselves just what the state of the art is.

For those who are unaware of what is entailed in putting a parser on a web
site, the actual programming and set-up required should take less than one
full week of programming and less than $100.  Much of the programming and
set-up has undoubtedly been completed by anyone who has a parser that can
be taken seriously.

Using the programs we have developed, we are currently signing development
contracts with game manufacturers and educational software developers.  We
are also developing ESL software for a large coporation in Japan with whom
we are discussing the creation of a similar parser for Japanese.  Thus, for
those of you who are "out there" with parsers, you may want to join this
challenge to help generate further development of this very important area
of our field.

Finally, we would like to point out that parsers that are not available on
the web are as suspect as theories that are unpublished. In the interests of
academic responsibility, if parsers are to be taken seriously they should be
as publicly available as theories that would be taken seriously i.e. by
means of a requirement of some sort of publication.  The best medium for
this for now is an Internet web site.  This allows a developer to
demonstrate his parser without compromising propeitary information. Maybe at
some point in the future, there will be "refereed web sites" just as there
are refereed journals, but for now, we will have to recognize that publicly
accessible web sites provide an appropriate forum to decide whether or not a
parser does indeed exist, what it can or cannot do, and whether or not it
compares favorably with others. This will help us avoid the problems of
people claiming there are lots of parsers "out there" without being required to
demonstrate this in any way.

We will summarize to the list commercial and noncommercial sites as well as
responses we get to this challenge.

TRY THIS PARSER AND ASK YOURSELF THE QUESTION:  "COULD I MAKE COMMERCIALLY
VIABLE SOFTWARE WITH THIS TOOL?"

Sincerely,

Philip A. Bralich, Ph.D.
President and CEO

P.S.  Resumes from those with a background in syntax and in creating Natural
Language dictionaries for the languages of Spanish, Russian, Arabic,
Japanese and German are being accepted for positions beginning in June.
Details will follow in later messages.


ERGO LINGUISTIC TECHNOLOGIES
Manoa Innovation Center
2800 Woodlawn Drive, Suite 175
Honolulu, Hawaii  96822
TEL: (808) 539-3920
FAX: (808) 539-3924

---------------------------------------------------------------------------
LINGUIST List: Vol-8-65



More information about the LINGUIST mailing list