FW: Parser Challenge

Fri Jan 3 18:11:00 UTC 1997

After reading the recent postings on FUNKNET about the parser challenge,
I went to the Ergo parser site and tried it out. I was particularly
interested since I have worked with the Link Grammar parser extensively,
and other parsers, and so I have a pretty good idea what the state of
the art looks like.

The functionality built into the Ergo interface is very nice: certainly
it is an advantage, for the purposes of evaluating parsers, being able
to get the grammatical analysis outputted directed in a simple and
easily understood format. And such functionalities as getting
transformational variants of sentences (especially question-answer
pairs) is of obvious commercial benefit. (Though there are certainly
other sites with such functionality. Usually, though, that is something
built for a particular application on top of a parser engine, rather
than being built into the parser. It would be nice as a standard parser
feature though.)

Leaving that aside, I found the performance of the Ergo parser
substantially below state of the art in the most important criterion:
being able to parse sentences reliably - at least, judging by the web
demo (though there are some risks in doing so, of course, since it is
always possible that performance problems are the result of incidental
bugs rather than the fundamental engine or its associated database.)
Quite frankly, though, the self-imposed limitation of 12-14 words
concerned me right off the bat, since most of the nastiest problems with
parsers compound exponentially with sentence length. But I decided to
try it out within those limitations.

As a practical test, I took one of the emails sent out from Ergo, and
tried variants of the sentences in it. By doing this, I avoided the trap
of trying simple garden-variety "example sentences" (which just about
any parser can handle) in favor of the variety of constructions you can
actually get in natural language text. But I reworded it slightly where
necessary to eliminate fragments and colloquialisms and to get it into
the 12-14 word length limit. That meant in most cases I had to try a
couple of variants involving parts of sentences, since most of the
sentences in the email were over the 12-14 word limit.

Here were the results:

I didn't realize it but our head programmer was here last night.
        -- did not parse

I fixed the sentences that Mr. Sleator said didn't work.
        -- failed to return a result at all within a reasonable time;
           I turned it off and tried another sentence after about ten minutes.

Our verb section of our dictionary on the web was corrupted.
        - parsed in a reasonable time.

Part of the problem was that our dictionary was corrupted.
        - took 74.7 seconds to parse

It is easy for us to update and repair problems with our parser.
        -again, it failed to return a result in a reasonable time.

This is something that most others cannot handle.
        -did not parse.

Even minor repairs take months.
        -again, it failed to return a result in a reasonable time.

I am not particularly surprised by these results. Actual normal use of
language has thousands of particular constructions that have to be
explicitly accounted for in the lexicon, so even if the parser engine
Ergo uses is fine, the database could easily be missing a lot of the
constructions necessary to handle unrestricted input robustly. Even the
best parsers I have seen need significant work on minor constructions;
but these sentences ought to parse. They are perfectly ordinary English
text (and in fact all but one parses in a less than a second on the
parser I am currently using).

No doubt the particular problems causing trouble with these sentences
can be fixed quickly (any parser which properly separates parse engine
from rule base should be easy to modify quickly) but the percentage of
sentences that parsed suggests that there's a fair bit of work left to
be done here.