FW: Parser Challenge
Philip A. Bralich, Ph.D.
bralich at HAWAII.EDU
Fri Jan 3 20:32:09 UTC 1997
At 08:11 AM 1/3/97 -1000, Paul Deane wrote:
>After reading the recent postings on FUNKNET about the parser challenge,
>I went to the Ergo parser site and tried it out. I was particularly
>interested since I have worked with the Link Grammar parser extensively,
>and other parsers, and so I have a pretty good idea what the state of
>the art looks like.
>
>The functionality built into the Ergo interface is very nice: certainly
>it is an advantage, for the purposes of evaluating parsers, being able
>to get the grammatical analysis outputted directed in a simple and
>easily understood format. And such functionalities as getting
>transformational variants of sentences (especially question-answer
>pairs) is of obvious commercial benefit. (Though there are certainly
>other sites with such functionality. Usually, though, that is something
>built for a particular application on top of a parser engine, rather
>than being built into the parser. It would be nice as a standard parser
>feature though.)
THis is the main point of our challenge. We chose these criteria because they
demonstrate to anyone the ability to do the basic tasks that underly any
real-world parsing job: name part of speech, part of sentence, tense, sentence
type, internal clauses and so on. Merely claiming these abilities or
making them visible only to those who know the theory is not enough really.
>Leaving that aside, I found the performance of the Ergo parser
>As a practical test, I took one of the emails sent out from Ergo, and
>tried variants of the sentences in it. By doing this, I avoided the trap
>of trying simple garden-variety "example sentences" (which just about
>any parser can handle) in favor of the variety of constructions you can
>actually get in natural language text. But I reworded it slightly where
>necessary to eliminate fragments and colloquialisms and to get it into
>the 12-14 word length limit. That meant in most cases I had to try a
>couple of variants involving parts of sentences, since most of the
>sentences in the email were over the 12-14 word limit.
This is a somewhat odd set of sentences to begin with though not completely
unfair. We are suggesting that the problem in parsing is that most
people are not handling anything properly. That is most cannot handle
the analysis of small or medium sentences properly. So while the
sentences you put in may be at our current upward length (partially
because our dictionary is only 60,000 words in size).
Still we have no idea that any other parser can do a full parse of
small and medium sentences.
The point of the challenge is to establish very tough criteria and then
work with it from smaller to medium to larger sentences.
The sentences input in this test will be working in just a few weeks,
but no other parser meets our challenge for small or medium size sentences.
We need to look at all parsers for all these criteria from small to large.
By the way, our current development will allow us to take large steps forward
every two months for the next year. After that we should level out.
The main points being this:
1. All parsers should be held to the task of labelling parts of speech,
parts of the
sentence, sentence type, and tense and voice as well as being able to
manipulate strings: change actives to passsives and statements to questions
and so on.
This after all is what parsing is. Creating trees is a preliminary step toward
formulating these generalizations about the syntax of the language you are
analyzing.
2. These criteria should be held for small medium and large sentences.
3. As our parser improves we will hold to these criteria for all size
sentences.
4. As it is only our parser can do all this for sentences of ANY size. The
claims
of other parsers are merely assertions until they provide these functions on
a web
site that all can see.
Phil Bralich
Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822
Tel: (808)539-3920
Fax: (808)5393924
More information about the Funknet
mailing list