8.135, Disc: Parsers

Thu Jan 30 18:15:49 UTC 1997

LINGUIST List:  Vol-8-135. Thu Jan 30 1997. ISSN: 1068-4875.

Subject: 8.135, Disc: Parsers

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>
Technical Editor:  Ron Reck <ron at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Susan Robinson <sue at linguistlist.org>

=================================Directory=================================

1)
Date:  Tue, 21 Jan 1997 17:10:00 -0500
From:  Paul Deane <PDeane at dataware.com>
Subject:  RE: Parser Challenge

2)
Date:  Tue, 21 Jan 1997 18:48:54 -1000
From:  Anne <annes at htdc.org>
Subject:  Re: 8.66, Disc: Parsers

3)
Date:  Wed, 22 Jan 1997 17:10:15 -0700
From:  hammond at U.Arizona.EDU (Mike Hammond)
Subject:  parsing SYLLABLES

-------------------------------- Message 1 -------------------------------

Date:  Tue, 21 Jan 1997 17:10:00 -0500
From:  Paul Deane <PDeane at dataware.com>
Subject:  RE: Parser Challenge

The "Parser Challenge" post has already appeared on several other
linguistics listservs, where it generated considerable discussion.

Since the post has now been posted to Linguist in its original form
without any reference to that earlier discussion, I am forwarding a
message I originally sent to the Funknet mailing list after the Parser
Challenge posting appeared there.

 ----------
From: Paul Deane
To: Multiple recipients of list FUNKNET
Subject: FW: Parser Challenge
Date: January 3, 1997 1:11PM

After reading the recent postings on FUNKNET about the parser
challenge, I went to the Ergo parser site and tried it out. I was
particularly interested since I have worked with the Link Grammar
parser extensively, and other parsers, and so I have a pretty good
idea what the state of the art looks like.

The functionality built into the Ergo interface is very nice:
certainly it is an advantage, for the purposes of evaluating parsers,
being able to get the grammatical analysis outputted directed in a
simple and easily understood format. And such functionalities as
getting transformational variants of sentences (especially
question-answer pairs) is of obvious commercial benefit. (Though there
are certainly other sites with such functionality. Usually, though,
that is something built for a particular application on top of a
parser engine, rather than being built into the parser. It would be
nice as a standard parser feature though.)

Leaving that aside, I found the performance of the Ergo parser
substantially below state of the art in the most important criterion:
being able to parse sentences reliably - at least, judging by the web
demo (though there are some risks in doing so, of course, since it is
always possible that performance problems are the result of incidental
bugs rather than the fundamental engine or its associated database.)
Quite frankly, though, the self-imposed limitation of 12-14 words
concerned me right off the bat, since most of the nastiest problems
with parsers compound exponentially with sentence length. But I
decided to try it out within those limitations.

As a practical test, I took one of the emails sent out from Ergo, and
tried variants of the sentences in it. By doing this, I avoided the
trap of trying simple garden-variety "example sentences" (which just
about any parser can handle) in favor of the variety of constructions
you can actually get in natural language text. But I reworded it
slightly where necessary to eliminate fragments and colloquialisms and
to get it into the 12-14 word length limit. That meant in most cases I
had to try a couple of variants involving parts of sentences, since
most of the sentences in the email were over the 12-14 word limit.

Here were the results:

I didn't realize it but our head programmer was here last night.
        -- did not parse

I fixed the sentences that Mr. Sleator said didn't work.
        -- failed to return a result at all within a reasonable time;
           I turned it off and tried another sentence after about ten
minutes.

Our verb section of our dictionary on the web was corrupted.
        - parsed in a reasonable time.

Part of the problem was that our dictionary was corrupted.
        - took 74.7 seconds to parse

It is easy for us to update and repair problems with our parser.
        -again, it failed to return a result in a reasonable time.

This is something that most others cannot handle.
        -did not parse.

Even minor repairs take months.
        -again, it failed to return a result in a reasonable time.

I am not particularly surprised by these results. Actual normal use of
language has thousands of particular constructions that have to be
explicitly accounted for in the lexicon, so even if the parser engine
Ergo uses is fine, the database could easily be missing a lot of the
constructions necessary to handle unrestricted input robustly. Even the
best parsers I have seen need significant work on minor constructions;
but these sentences ought to parse. They are perfectly ordinary English
text (and in fact all but one parses in a less than a second on the
parser I am currently using).

No doubt the particular problems causing trouble with these sentences
can be fixed quickly (any parser which properly separates parse engine
from rule base should be easy to modify quickly) but the percentage of
sentences that parsed suggests that there's a fair bit of work left to
be done here.

-------------------------------- Message 2 -------------------------------

Date:  Tue, 21 Jan 1997 18:48:54 -1000
From:  Anne <annes at htdc.org>
Subject:  Re: 8.66, Disc: Parsers

>Philip Bralich suggests that those of us working in the area pf parsing
>should make our systems available via the web.  Davy Temperley and I are
>in full agreement with this.  That's why a demonstration of our link
>grammar system has been up on the web for over a year.  Go to
>"www.cs.cmu.edu/~sleator" and click on "link grammar" to get to the
>parser page.
>
>Philip has also proposed a set of criteria by which parsing systems can
>be judged:
>
>> In addition to using a dictionary that is at least 25,000 words in
>> size and working in real time and handling sentences up to 12 or 14
>> words in length (the size required for most commercial applications),
>> we suggest that parsers should also meet the following standards
>> before engaging this challenge:
>>

But your system simply does not manipulate sentences nor does it label
parts of speech or parts of a sentence with categories that are
meaningful to others.  For example, you have categories of "W" and "Z"
and so on that are simply opaque to anyone who is an outsider to your
theory.

>Whether or not anybody else agrees that these are the right desiderata,
>it's useful that he's put them forward.  We can use them to evaluate
>our own work, and Bralich's work as well.  We have done this, and
>it seems to us that our system is superior to Bralich's.

Well we suggest the readers try themselves.

> We have worked out a sentence
>constructing mechanism that we believe would be able to handle this as
>well.  Of course we'll have to do the work to make this convincing.  We
>may be inspired to add this feature as a result of these discussions.

Yes.  Let's see that. That is all we are suggesting.  WE are merely look-
ing to bring the same level of accountability to this discussion that
publications brings to authors.  Your or our statements of the
wonderfulness of our products will then be open to the judgements of
others.  And we definately suggest that users try all the parsers they
can and then make up their own minds.  Our main contention is that your
unwillingness to put things on the web in a manner that all can judge
and evaluate is at best suspicious.

>
>Bralich's aim is to build a parser that will be useful for interactive
>games and other applications. It is therefore restricted to short
>sentences, and has a fairly small vocabulary.  However, even with these
>constraints, there are a number of very basic constructions that his
>parser cannot handle. Here are some examples. All of the sentences below
>are simply rejected by his parser.
>
>	I went out		The parser does not allow two-word verbs
>	He came in		like "set up", "go out", "put in", which are
>	He sent it off		extremely common.
>	I set it up

This is simply not true.  Try it yourselves.

>
>	He did it quickly	The parser seems to have extremely limited
>				use of adverbs. (It does accept some
>				constructions of this type, like "He ran
>				quickly", so perhaps this is a bug.)
>
>	John and Fred are here	The parser does not know that conjoined
>				singular noun phrases take plural verbs.
>
>	The dog jumped and the  The parser does not seem to
>	cat ran			accept ANY sentences in which clauses
>				are joined with conjunctions.
>
>	He said he was coming	The parser accepts "He said THAT he was
>				coming"; but it does not allow deletion of
>				"THAT", which is extremely common with some
>				verbs
>
>	I made him angry	There are a number of kinds of verb
>	I saw him leave		verb complements which the parser does
>	I suggested he go	not handle: direct object + adjective
>				("I made him angry"), direct object +
>				infinitive ("I saw him leave"),
>				subjunctive ("I suggested [that] he go").
>
>	His attempt to do it	The parser cannot handle nouns that take
>	was a failure		infinitives.
>
>	I went to the store 	The parser cannot handle the extremely
>	to get some milk	common use of infinitive phrases meaning
>				"In order to".

These also parse correctly. Though there was a time a few weeks ago
when there  may have been some problems.

>
>There are also cases where the parser assigns the wrong interpretation
>to sentences. One of the biggest problems here is in the treatment of
>verbs. Verbs in English take many different kinds of complements: direct
>objects, infinitives, clauses, indirect questions, adjectives, object +
>clause, and so on. The Ergo Parser seems to treat all of these
>complements as direct objects, and makes no distinctions between which
>verbs take which kind. This means, in the first place, that it will
>accept all kinds of strange sentences like "I chased that he came",
>blithely labeling the embedded clause as an object of "chased". More
>seriously, this often causes it to assign the wrong interpretation to
>sentences. For example,
>
>	I left when he came
>
>The verb "left" can be either transitive or intransitive. Here, it is
>clearly being used intransitively, with "when he came" acting as a
>subordinate clause. But the Ergo Parser treats "when he came" as a
>direct object.
>				
>The program does not seem to analyze relative clauses at all. In
>the sentence
>
>	The dog I saw was black

>the parser states that "I" is the subject of "saw", and that "The dog I
>saw" is the subject of "was", but does not state that "dog" is the
>object of "saw". The program also accepts "The dog I died was black"
>(analyzing it in the same way), further indicating that it simply has no
>understanding of relative clauses.

YOu should check this again.

>
>In the sentence "How big is it", the program analyzes "how big" as the
>subject of the sentence.
>
>We were able to identify all these problems with the Ergo parser without
>knowing anything about how it works -- the formalism used is
>proprietary.  A plethora of new problems would probably emerge if we
>knew how it worked.  And all of these problems will probably be
>exacerbated with longer sentences.

The main problem with Sleator's and other parsers is that the majority
of what they claim is nothing more than claims. WE put our parser on
the web with all its good and bad points for all to judge. We merely ask
that others do the same so that all can judge the state of the art.

>All of these problems with the Ergo Parser - constructions that it does
>not accept, and things that it mis-analyzes - are things that our system
>handles well.

Mere assertion.

>Indeed, the _original_ 1991 version of our parser could
>handle all these things. In our version 2.0, released in 1995, we
>incorporate many constructions which are less common. We should point
>out that even the latest version of our parser is far from perfect. It
>finds complete, correct parses for about 80% of Wall Street Journal
>sentences.

If it has been around since 1991 why cannot it not do the simplest
manipulations of strings. For example, change a passive to active or a
statement to a question.

>The reader can try both systems for himself or herself, and come to
>his/her own conclusions.  (The Ergo parser is at www.ergo-ling.com, ours
>is at www.cs.cmu.edu/~sleator.)

YEs, and do take these comments (and ours) with a grain of salt.  Run
the same sentences on both parsers and then decide for yourself what
they can and cannot do. But please, do not take assertion of parsing
for parsing.  IF you do not see it on the web it is probably not
possible.

Phil Bralich

-------------------------------- Message 3 -------------------------------

Date:  Wed, 22 Jan 1997 17:10:15 -0700
From:  hammond at U.Arizona.EDU (Mike Hammond)
Subject:  parsing SYLLABLES

All:

There was a recent posting that recommended that computational
proposals, specifically parsers, should be available on the web. It
also posted the URLs for a number of parsers.

I want to make several points about this.

1. I agree.

2. I have a parser of my own that has been up and running on the web. The
relevant URL is as follows.
        http://www.u.arizona.edu/~hammond

3. The parser above parses segments into syllables. It's not terribly
robust, but it exemplifies i) a particular strategy for implementing
Optimality Theory, and ii) a proposal to account for certain
psycholinguistic effects.

Mike Hammond

********************************************************************

PLEASE DIRECT ALL EMAIL TO: hammond at u.arizona.edu

The following address will cease to work shortly: hammond at ccit.arizona.edu

Michael Hammond
Department of Linguistics
University of Arizona
Tucson, AZ 85721

phone: (520) 621-5759, 621-6897
fax: (520) 626-9014
email: hammond at u.arizona.edu
www: http://aruba.ccit.arizona.edu/~hammond

---------------------------------------------------------------------------
LINGUIST List: Vol-8-135