8.66, Disc: Parsers
linguist at linguistlist.org
linguist at linguistlist.org
Tue Jan 21 21:28:42 UTC 1997
LINGUIST List: Vol-8-66. Tue Jan 21 1997. ISSN: 1068-4875.
Subject: 8.66, Disc: Parsers
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>
Review Editor: Andrew Carnie <carnie at linguistlist.org>
Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
Ann Dizdar <ann at linguistlist.org>
Assistant Editor: Sue Robinson <sue at linguistlist.org>
Technical Editor: Ron Reck <ron at linguistlist.org>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Zhiping Zheng <zzheng at online.emich.edu>
Editor for this issue: Helen Dry <hdry at emunix.emich.edu>
=================================Directory=================================
1)
Date: Tue, 31 Dec 1996 19:10:00 -0500
From: Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject: Link Grammar parser
2)
Date: Wed, 01 Jan 1997 16:34:46 -0500
From: Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject: PARSER COMPARISON
-------------------------------- Message 1 -------------------------------
Date: Tue, 31 Dec 1996 19:10:00 -0500
From: Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject: Link Grammar parser
I want to thank Derek Bickerton for putting his system up on the web for
evaluation. (Although I was unable to get through to it.)
In his list of parsers on the web, he neglected to mention the Link
Grammar system originally developed by Davy Temperley and me (and
subsequently extended by John Lafferty and Dennis Grinberg). To try
it, go to my home page:
http://www.cs.cmu.edu/~sleator
and choose the "link grammar" option. You'll be able to test the
system, see a list of English phenomena it covers, get full
documentation, and see references to other work built upon link
grammars. The grammatical theory upon which this is based has been
published, and the source code and grammar of our system is available
via anonymous FTP.
Some other preliminary demonstrations of work going on in our group can
be found by following the "natural language playground" link from
http://www.cs.cmu.edu/~dougb.
Daniel Sleator Office: 412-268-7563 (Fax: 412-268-5576)
Professor of Computer Science Home: 412-362-8675 (Fax: 412-362-4443)
Carnegie Mellon University http://www.cs.cmu.edu/~sleator
Pittsburgh, PA 15213 sleator at cs.cmu.edu
-------------------------------- Message 2 -------------------------------
Date: Wed, 01 Jan 1997 16:34:46 -0500
From: Daniel Sleator <Daniel_Sleator at bobo.link.cs.cmu.edu>
Subject: PARSER COMPARISON
Philip Bralich suggests that those of us working in the area pf parsing
should make our systems available via the web. Davy Temperley and I are
in full agreement with this. That's why a demonstration of our link
grammar system has been up on the web for over a year. Go to
"www.cs.cmu.edu/~sleator" and click on "link grammar" to get to the
parser page.
Philip has also proposed a set of criteria by which parsing systems can
be judged:
> In addition to using a dictionary that is at least 25,000 words in
> size and working in real time and handling sentences up to 12 or 14
> words in length (the size required for most commercial applications),
> we suggest that parsers should also meet the following standards
> before engaging this challenge:
>
> At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
> STRINGS, the parser should:, 1) identify parts of speech, 2) identify
> parts of sentence, 3) identify internal clauses, 4) identify sentence
> type (without using punctuation), and 5) identify tense and voice in
> main and internal clauses.
>
> At a minimum from the point of view of EVALUATION OF STRINGS, the
> parser should: 1) recognize acceptable strings, 2) reject unacceptable
> strings, 3) give the number of correct parses identified, 4) identify
> what sort of items succeeded (e.g. sentences, noun phrases, adjective
> phrases, etc), 5) give the number of unacceptable parses that were
> tried, and 6) give the exact time of the parse in seconds.
>
> At a minimum, from the point of view of MANIPULATION OF STRINGS, the
> parser should: 1) change questions to statements and statements to
> questions, 2) change actives to passives in statements and questions
> and change passives to actives in statements and questions, and 3)
> change tense in statements and questions.
Whether or not anybody else agrees that these are the right desiderata,
it's useful that he's put them forward. We can use them to evaluate
our own work, and Bralich's work as well. We have done this, and
it seems to us that our system is superior to Bralich's.
The version of link grammar that we have put up on the web already does
very well in a number of these criteria. Regarding STRUCTURAL ANALYSIS,
the parser outputs a representation of a sentence which contains much of
the information discussed by Bralich. Parts of speech are shown
explicitly; things like constituent structure are virtually explicit
(for example, a subject phrase is anything that is on the left end of an
"S" link). Tense and aspect are not explicit in the output, but they
could quite easily be recovered. Regarding EVAULATION OF STRINGS, our
system is far superior to the Ergo parser. Our system does an excellent
job of distinguishing acceptable from unacceptable sentences.
Furthermore, it is often able to obtain useful structural information
from non-grammatical sentences, by making use of "null-links". Below we
discuss some basic problems with the Ergo parser regarding its
evaluation and analysis of sentences. We have not implemented a
MANIPULATION OF STRINGS component. We have worked out a sentence
constructing mechanism that we believe would be able to handle this as
well. Of course we'll have to do the work to make this convincing. We
may be inspired to add this feature as a result of these discussions.
Bralich's aim is to build a parser that will be useful for interactive
games and other applications. It is therefore restricted to short
sentences, and has a fairly small vocabulary. However, even with these
constraints, there are a number of very basic constructions that his
parser cannot handle. Here are some examples. All of the sentences below
are simply rejected by his parser.
I went out The parser does not allow two-word verbs
He came in like "set up", "go out", "put in", which are
He sent it off extremely common.
I set it up
He did it quickly The parser seems to have extremely limited
use of adverbs. (It does accept some
constructions of this type, like "He ran
quickly", so perhaps this is a bug.)
John and Fred are here The parser does not know that conjoined
singular noun phrases take plural verbs.
The dog jumped and the The parser does not seem to
cat ran accept ANY sentences in which clauses
are joined with conjunctions.
He said he was coming The parser accepts "He said THAT he was
coming"; but it does not allow deletion of
"THAT", which is extremely common with some
verbs
I made him angry There are a number of kinds of verb
I saw him leave verb complements which the parser does
I suggested he go not handle: direct object + adjective
("I made him angry"), direct object +
infinitive ("I saw him leave"),
subjunctive ("I suggested [that] he go").
His attempt to do it The parser cannot handle nouns that take
was a failure infinitives.
I went to the store The parser cannot handle the extremely
to get some milk common use of infinitive phrases meaning
"In order to".
There are also cases where the parser assigns the wrong interpretation
to sentences. One of the biggest problems here is in the treatment of
verbs. Verbs in English take many different kinds of complements: direct
objects, infinitives, clauses, indirect questions, adjectives, object +
clause, and so on. The Ergo Parser seems to treat all of these
complements as direct objects, and makes no distinctions between which
verbs take which kind. This means, in the first place, that it will
accept all kinds of strange sentences like "I chased that he came",
blithely labeling the embedded clause as an object of "chased". More
seriously, this often causes it to assign the wrong interpretation to
sentences. For example,
I left when he came
The verb "left" can be either transitive or intransitive. Here, it is
clearly being used intransitively, with "when he came" acting as a
subordinate clause. But the Ergo Parser treats "when he came" as a
direct object.
The program does not seem to analyze relative clauses at all. In
the sentence
The dog I saw was black
the parser states that "I" is the subject of "saw", and that "The dog I
saw" is the subject of "was", but does not state that "dog" is the
object of "saw". The program also accepts "The dog I died was black"
(analyzing it in the same way), further indicating that it simply has no
understanding of relative clauses.
In the sentence "How big is it", the program analyzes "how big" as the
subject of the sentence.
We were able to identify all these problems with the Ergo parser without
knowing anything about how it works -- the formalism used is
proprietary. A plethora of new problems would probably emerge if we
knew how it worked. And all of these problems will probably be
exacerbated with longer sentences.
All of these problems with the Ergo Parser - constructions that it does
not accept, and things that it mis-analyzes - are things that our system
handles well. Indeed, the _original_ 1991 version of our parser could
handle all these things. In our version 2.0, released in 1995, we
incorporate many constructions which are less common. We should point
out that even the latest version of our parser is far from perfect. It
finds complete, correct parses for about 80% of Wall Street Journal
sentences.
The reader can try both systems for himself or herself, and come to
his/her own conclusions. (The Ergo parser is at www.ergo-ling.com, ours
is at www.cs.cmu.edu/~sleator.)
Daniel Sleator <sleator at cs.cmu.edu>
Davy Temperley <dt3 at columbia.edu>
---------------------------------------------------------------------------
LINGUIST List: Vol-8-66
More information about the LINGUIST
mailing list