8.66, Disc: Parsers

Tue Jan 21 21:28:42 UTC 1997

LINGUIST List:  Vol-8-66. Tue Jan 21 1997. ISSN: 1068-4875.

Subject: 8.66, Disc: Parsers

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>
Technical Editor:  Ron Reck <ron at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Editor for this issue: Helen Dry <hdry at emunix.emich.edu>

=================================Directory=================================

1)
Date:  Tue, 31 Dec 1996 19:10:00 -0500
From:  Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject:  Link Grammar parser

2)
Date:  Wed, 01 Jan 1997 16:34:46 -0500
From:  Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject:  PARSER COMPARISON

-------------------------------- Message 1 -------------------------------

Date:  Tue, 31 Dec 1996 19:10:00 -0500
From:  Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject:  Link Grammar parser

I want to thank Derek Bickerton for putting his system up on the web for
evaluation.  (Although I was unable to get through to it.)

In his list of parsers on the web, he neglected to mention the Link
Grammar system originally developed by Davy Temperley and me (and
subsequently extended by John Lafferty and Dennis Grinberg).  To try
it, go to my home page:

                   http://www.cs.cmu.edu/~sleator

and choose the "link grammar" option.  You'll be able to test the
system, see a list of English phenomena it covers, get full
documentation, and see references to other work built upon link
grammars.  The grammatical theory upon which this is based has been
published, and the source code and grammar of our system is available
via anonymous FTP.

Some other preliminary demonstrations of work going on in our group can
be found by following the "natural language playground" link from
http://www.cs.cmu.edu/~dougb.

  Daniel Sleator                 Office: 412-268-7563 (Fax: 412-268-5576)
  Professor of Computer Science  Home:   412-362-8675 (Fax: 412-362-4443)
  Carnegie Mellon University     http://www.cs.cmu.edu/~sleator
  Pittsburgh, PA 15213           sleator at cs.cmu.edu

-------------------------------- Message 2 -------------------------------

Date:  Wed, 01 Jan 1997 16:34:46 -0500
From:  Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject:  PARSER COMPARISON

Philip Bralich suggests that those of us working in the area pf parsing
should make our systems available via the web.  Davy Temperley and I are
in full agreement with this.  That's why a demonstration of our link
grammar system has been up on the web for over a year.  Go to
"www.cs.cmu.edu/~sleator" and click on "link grammar" to get to the
parser page.

Philip has also proposed a set of criteria by which parsing systems can
be judged:

> In addition to using a dictionary that is at least 25,000 words in
> size and working in real time and handling sentences up to 12 or 14
> words in length (the size required for most commercial applications),
> we suggest that parsers should also meet the following standards
> before engaging this challenge:
>
> At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
> STRINGS, the parser should:, 1) identify parts of speech, 2) identify
> parts of sentence, 3) identify internal clauses, 4) identify sentence
> type (without using punctuation), and 5) identify tense and voice in
> main and internal clauses.
>
> At a minimum from the point of view of EVALUATION OF STRINGS, the
> parser should: 1) recognize acceptable strings, 2) reject unacceptable
> strings, 3) give the number of correct parses identified, 4) identify
> what sort of items succeeded (e.g. sentences, noun phrases, adjective
> phrases, etc), 5) give the number of unacceptable parses that were
> tried, and 6) give the exact time of the parse in seconds.
>
> At a minimum, from the point of view of MANIPULATION OF STRINGS, the
> parser should: 1) change questions to statements and statements to
> questions, 2) change actives to passives in statements and questions
> and change passives to actives in statements and questions, and 3)
> change tense in statements and questions.

Whether or not anybody else agrees that these are the right desiderata,
it's useful that he's put them forward.  We can use them to evaluate
our own work, and Bralich's work as well.  We have done this, and
it seems to us that our system is superior to Bralich's.

The version of link grammar that we have put up on the web already does
very well in a number of these criteria. Regarding STRUCTURAL ANALYSIS,
the parser outputs a representation of a sentence which contains much of
the information discussed by Bralich. Parts of speech are shown
explicitly; things like constituent structure are virtually explicit
(for example, a subject phrase is anything that is on the left end of an
"S" link). Tense and aspect are not explicit in the output, but they
could quite easily be recovered. Regarding EVAULATION OF STRINGS, our
system is far superior to the Ergo parser. Our system does an excellent
job of distinguishing acceptable from unacceptable sentences.
Furthermore, it is often able to obtain useful structural information
from non-grammatical sentences, by making use of "null-links".  Below we
discuss some basic problems with the Ergo parser regarding its
evaluation and analysis of sentences.  We have not implemented a
MANIPULATION OF STRINGS component. We have worked out a sentence
constructing mechanism that we believe would be able to handle this as
well.  Of course we'll have to do the work to make this convincing.  We
may be inspired to add this feature as a result of these discussions.

Bralich's aim is to build a parser that will be useful for interactive
games and other applications. It is therefore restricted to short
sentences, and has a fairly small vocabulary.  However, even with these
constraints, there are a number of very basic constructions that his
parser cannot handle. Here are some examples. All of the sentences below
are simply rejected by his parser.

	I went out		The parser does not allow two-word verbs
	He came in		like "set up", "go out", "put in", which are
	He sent it off		extremely common.
	I set it up

	He did it quickly	The parser seems to have extremely limited
				use of adverbs. (It does accept some
				constructions of this type, like "He ran
				quickly", so perhaps this is a bug.)

	John and Fred are here	The parser does not know that conjoined
				singular noun phrases take plural verbs.

	The dog jumped and the  The parser does not seem to
	cat ran			accept ANY sentences in which clauses
				are joined with conjunctions.

	He said he was coming	The parser accepts "He said THAT he was
				coming"; but it does not allow deletion of
				"THAT", which is extremely common with some
				verbs

	I made him angry	There are a number of kinds of verb
	I saw him leave		verb complements which the parser does
	I suggested he go	not handle: direct object + adjective
				("I made him angry"), direct object +
				infinitive ("I saw him leave"),
				subjunctive ("I suggested [that] he go").

	His attempt to do it	The parser cannot handle nouns that take
	was a failure		infinitives.

	I went to the store 	The parser cannot handle the extremely
	to get some milk	common use of infinitive phrases meaning
				"In order to".

There are also cases where the parser assigns the wrong interpretation
to sentences. One of the biggest problems here is in the treatment of
verbs. Verbs in English take many different kinds of complements: direct
objects, infinitives, clauses, indirect questions, adjectives, object +
clause, and so on. The Ergo Parser seems to treat all of these
complements as direct objects, and makes no distinctions between which
verbs take which kind. This means, in the first place, that it will
accept all kinds of strange sentences like "I chased that he came",
blithely labeling the embedded clause as an object of "chased". More
seriously, this often causes it to assign the wrong interpretation to
sentences. For example,

	I left when he came

The verb "left" can be either transitive or intransitive. Here, it is
clearly being used intransitively, with "when he came" acting as a
subordinate clause. But the Ergo Parser treats "when he came" as a
direct object.

The program does not seem to analyze relative clauses at all. In
the sentence

	The dog I saw was black

the parser states that "I" is the subject of "saw", and that "The dog I
saw" is the subject of "was", but does not state that "dog" is the
object of "saw". The program also accepts "The dog I died was black"
(analyzing it in the same way), further indicating that it simply has no
understanding of relative clauses.

In the sentence "How big is it", the program analyzes "how big" as the
subject of the sentence.

We were able to identify all these problems with the Ergo parser without
knowing anything about how it works -- the formalism used is
proprietary.  A plethora of new problems would probably emerge if we
knew how it worked.  And all of these problems will probably be
exacerbated with longer sentences.

All of these problems with the Ergo Parser - constructions that it does
not accept, and things that it mis-analyzes - are things that our system
handles well. Indeed, the _original_ 1991 version of our parser could
handle all these things. In our version 2.0, released in 1995, we
incorporate many constructions which are less common. We should point
out that even the latest version of our parser is far from perfect. It
finds complete, correct parses for about 80% of Wall Street Journal
sentences.

The reader can try both systems for himself or herself, and come to
his/her own conclusions.  (The Ergo parser is at www.ergo-ling.com, ours
is at www.cs.cmu.edu/~sleator.)

      Daniel Sleator <sleator at cs.cmu.edu>
      Davy Temperley <dt3 at columbia.edu>

---------------------------------------------------------------------------
LINGUIST List: Vol-8-66