9.395, Disc: NLP and Syntax

Wed Mar 18 22:41:47 UTC 1998

LINGUIST List:  Vol-9-395. Wed Mar 18 1998. ISSN: 1068-4875.

Subject: 9.395, Disc: NLP and Syntax

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Editors:  	    Brett Churchill <brett at linguistlist.org>
		    Martin Jacobsen <marty at linguistlist.org>
		    Elaine Halleck <elaine at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Julie Wilson <julie at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 16 Mar 1998 13:50:45 -0500 (EST)
From:  "Samuel L. Bayer" <sam at linus.mitre.org>
Subject:  Re: 9.383: Disc: NLP and Syntax

2)
Date:  Mon, 16 Mar 1998 16:22 -0500 (EST)
From:  Mike_Maxwell at sil.org
Subject:  Disc: NLP and Syntax

-------------------------------- Message 1 -------------------------------

Date:  Mon, 16 Mar 1998 13:50:45 -0500 (EST)
From:  "Samuel L. Bayer" <sam at linus.mitre.org>
Subject:  Re: 9.383: Disc: NLP and Syntax

Phil Bralich wrote:

> Let me illustrate with some quotes from the MUC-6 web page which
> outlines the tasks to be accomplished.  To see this site yourself go
> to http://cs.nyu.edu/cs/faculty/grishman/muc6.html (I have no idea
> why no one in these discussions is providing the relevant URLs
> besides me).

> You will note in the following that there is no concern whatsoever
> for the ability to do a constituent analysis of a tree as that is
> precisely what is being avoided.  Certainly any amount of
> constituent analysis would be of value and would not be excluded but
> there seems to be an awareness that it is not available so different
> criteria are chosen.  Note also that information being extracted is
> not described in terms of phrases or clauses.

> [excerpt deleted]

[ ... and later ... ]

> However, having demonstrated with a 75,000 word dictionary and a
> parser that does a wide variety of new functions, we can no longer
> dismiss theoretical syntax.  Certainly, IE and IR will benefit
> greatly when we extend our current tools to those areas because we
> provide so much more information about the environments of the
> "named entities" and so on.

[ ... and later ... ]

> Here you are talking about doing something useful with huge numbers
> of documents of unrestricted text whereas I am speaking (primarily)
> about doing question/answer, statement/response repartee, grammar
> checking, the improvemengt of machine translation devices and of
> course about a significant, over night improvement in naviagation
> and control devices.  Nothing in the MUC standards speaks to any of
> this.  The MUC standards are actually quite narrow compared to the
> very wide realm of what is possible with NLP.

> This problem is not unknown in the field. Take a look at what is
> said in _The State of the Art in Human Language Technology_, Ron
> Cole Editor-in-Chief.  A 1996 report commissioned by the National
> Science Foundation.  In this report, Ted Briscoe (Section 3.7, p. 1)
> states, "Despite over three decades of research effort, no practical
> domain independent parser of unrestricted text has been developed."
> In addition, in that same report, Hans Uszkoreit and Anne Zaenen
> state (Section 3.1, p. 1), "Currently, no methods exist for
> efficient distributed grammar engineering [parsing].  This
> constitutes a serious bottleneck in the development of language
> technology products."

> Thus, while there is some IE and IR happening without parsers, there
> are hundreds of other possible technologies that cannot be developed
> with the standards used by MUC.  To create these other technologies
> it is necessary to meet standards just like those I have proposed or
> the bottleneck will not be broken.  In addition all IR and IE
> technologies will be significantly improved once these NLP tools are
> brought to bear on that domain.

[ ... and later ... ]

> While there is a lack of precision in some of them, I don't think it
> is at all a problem to expect a system to label tense, change active
> to passive or passive to active or to be able to answer a simple
> question. I also do not find that spectacularly undefined.

[ ... and later ... ]

> All I suggest is that the reader go to the MUC page himself (URL
> given above) and decide for himself.  The tasks that IR and IE set
> out for themselves may be of some value in that very limited domain,
> but they have absolutlely no applicability to the development of
> other NL tools such as q&a machine translation and so on.  In order
> to approach these other areas you absolutely have to have a parser.

[ ... and finally, in response to someone else ... ]

> However, I do not see how anyone could come even close to meeting
> the standards I propose without first having a fully worked out
> theory of syntax.  Even if programmers were the ones to develop a
> program that met those standards we would have to admit that
> somewhere in those lines of code was a true theory of syntax.  Even
> if you just created a huge series of gerry rigs, either they would
> not work or they would merge into a theory.  The phenomena to be
> described are complex, subtle, and intricate.  Only a completely
> worked out theory of syntax will result in such programs.

First, I welcome Dr. Bralich's more subdued tone. But I still disagree
with him, and I think that the final paragraph I've cited here
illustrates the crux of our disagreement. I agree, of course, that
there is a theory of language, no matter how simple and crude,
embedded in ANY language-processing system. There used to be people in
NLP who denied that, and claimed that their approaches were purely
statistical, but I'm pretty sure most of those folks have finally
admitted that such a goal is impossible (and counterproductive). And I
certainly agree that the MUC criteria for evaluation bear on a tiny
subset of the potential applications of language processing
systems. Dr. Bralich lists a number of other potential applications,
which I've repeated above. However, these observations do NOT imply,
together, that Dr. Bralich's standards are very useful, and the
problem is that Dr. Bralich cannot conceive of a situation in which
his desiderata do not entail his conclusions.

For instance, Dr. Bralich lists question-answering as one of the
potential applications which MUC does not address. This is
true. However, the DARPA community HAS addressed question-answering
systems in a related evaluation: the ATIS air travel reservation
domain for spoken language understanding systems. In this evaluation,
an attempt was made to define a syntactic/shallow semantic level of
evaluation; the effort was called SEMEVAL, and it failed
miserably. The problem was that there are too many different theories
of language, and settling on an intermediate level of evaluation which
embraced a particular one (and it was of course impossible to settle
on such a level without embracing a particular one) was
counterproductive and irrelevant, for a number of reasons:

(1) the point was answering the question, not parsing the
sentence. The intermediate representation was unimportant as an
evaluation criterion.

(2) there were many people who were using a simple frame-based
approach which did not use an articulated intermediate representation,
and they (rightly) observed that they would be unfairly penalized for
using an alternative approach to reaching the same goal.

(3) the range of syntactic constructions encountered in a corpus of
14,000 spontaneous utterances on air travel is pretty large, and
linguistic theories do not yet address the majority of them. In other
words, not only was the proposed evaluation level unimportant and
biased, it was also impossible to determine the "right" answer,
because we don't know it yet.

All these objections apply to Dr. Bralich's proposed criteria.

I sympathize tremendously with Dr. Bralich's goals. However, the
problem of standards for language processing systems is far more
complex than he's anticipating. There are three purposes one might
conjure up for defining such standards:

(a) it helps determine whether systems are behaving in linguistically
well-justified ways

(b) it allows one to compare systems

(c) it contributes to the determination of whether these systems can
contribute to tasks which require linguistic processing

The problem is that (a) is ill-defined, and there's no evidence that
it bears any relationship to (c), and only bears on (b) if you're
trying to evaluate a system without a task in mind (which turns out to
be nearly impossible). Dr. Bralich seems not to be able to imagine a
scenario in which (a) and (c) do not entail each other, and my point
is that the entire history of evaluation of language systems has
failed to demonstrate that they are more than peripherally connected.

[Just to clarify, I say (a) is ill-defined because theories currently
vastly underanalyze the available data, and conflict on the data that
they do analyze; resolving these conflicts MIGHT be useful for
evaluation if (a) were to imply (c) and there was no other way to get
to (c), but this has never been shown.]

Now, like Dr. Bralich, I don't believe for a second that we can get
100% of language analysis for ANY application without a detailed
theory of syntax; but (a) I don't really care which one it is (well, I
do, but not for the purposes of this discussion :-) ) and (b) we're
nowhere near 100% analysis for any task (and demonstrating that
Dr. Bralich's system matches the Penn Treebank at 100% accuracy
indicates nothing relative to this goal), and committing to
Dr. Bralich's criteria counterproductively biases the path that we
take toward this goal.

By way of conclusion, let me elaborate on this final point. Tremendous
progress has been made in computational linguistics over the last ten
years by ABANDONING the commonly-held convictions in theoretical
linguistics on how to make progress in this area. Theoretically
well-motivated systems perform no better, and in many situations
perform more poorly (both in speed and accuracy), than
pragmatically-constructed systems which fundamentally change the
assumptions about how language research is to be conducted. These
systems have a theory of language behind them; it's just a sort of
theory which theoretical linguists aren't very interested
in. Dr. Bralich's standards impose a bias from theoretical linguistics
on what systems MUST do in order to be successful; so far, this bias
has been demonstrated by the computational linguistics community to be
false and counterproductive. I personally think this result has vast
implications for linguistic theory, and I would hate to see a set of
standards adopted which effectively eliminated this alternative branch
of investigation.

Samuel Bayer
The MITRE Corporation

-------------------------------- Message 2 -------------------------------

Date:  Mon, 16 Mar 1998 16:22 -0500 (EST)
From:  Mike_Maxwell at sil.org
Subject:  Disc: NLP and Syntax

In vol-9-383, Philip A. Bralich <bralich at hawaii.edu> wrote:

>...at some point we have to step back and ask ourselves which
>theories, whatever their motivation are >capable of accounting for the
>data.  That is which theories after three decades of trying have done
>the best job and how can we demonstrate that?  ...Nor am I saying
>they will not at some point arrive at a proper description of the data
>that does indeed meet their goals (whether that be a description of
>processing, acquistion, or whatever) but I am saying that as long as
>they do not have a demonstrably satisfying account of the basics, they
>can make no strong claim to being a mature or effective theory.  They
>can call themselves psychologically motivated, or learning theory
>motivated or whatever, but they cannot make many claims to being able
>to account for the data.

[Bralich quotes Piu ten Hacken:]

>>1. a linguistic theory which does not give a description of all the
phenomenaBralich's parser covers >>need not be a bad theory;

[Bralich again:]
>Correct, but it is also not a mature theory.

"The data" which a theory might claim to be able to "account for"
encompasses a wide range of phenomena.  Quite apart from the fact that
Bralich has only claimed that his parser + grammar works for English,
Chomsky outlined a long time ago (1965, in "Aspects of the Theory of
Syntax"), three levels of adequacy for theories, namely observational,
descriptive and explanatory.  So far as I have seen in this
discussion, Bralich is saying his parser + grammar does a good job at
observational adequacy--this is what he appears to be calling "the
basics"--but I have seen nothing about descriptive or explanatory
adequacy.  Over the years, there have been many parser + grammar
combinations that have achieved a reasonably high degree of
observational adequacy.  (The fact that they have not been put "in the
reviewers hands", as Bralich says they should be, does not negate this
claim.  Many of these programs are proprietary to the companies that
developed them, and they are not about to release them as stand-alone
parsers.)  But observational adequacy is not the only goal for most
modern theories of linguistics, and a theory which achieved only
observational adequacy--if there were such--cannot claim to be a
mature theory, either.

So why not try to compare parsing programs on the level of descriptive
adequacy?  Because that presupposes we know what the correct analysis
of every sentence in some large (and "complete") test corpus is, and
we don't.  That isn't to say people haven't tried to do such
comparisons over the years, just that it has turned out to be more
difficult than expected.  While many parsing programs achieve
reasonably similar levels of observational adequacy, i.e. they
recognize more or less the same sets of sentences, they do not assign
comparable constituent structures, and it is not apparent which
constituent structure is correct.  (Or worse for purposes of
comparison, an LFG parser, say, will assign two different sorts of
parallel structures where a GPSG parser will assign only one.)

I could give a long listing here of syntactic constructions of English
for which the appropriate structure is not apparent, but I'll content
myself with one example: In the sentence "I wonder who came", is there
a gap after "who"?  (For that matter, is there a gap, in the sense of
a phrasal node which does not dominate any terminals, in _any_
wh-construction?)

It goes without saying that parsing programs do not attempt to achieve
explanatory adequacy.  There are computer programs which have made
forays into this area, but that's a different topic.  Apart from those
attempts, explanatory adequacy has been the exclusive realm of
theoretical linguists.

In summary, after "three decades of trying", there is no "mature
theory" of syntax, not even whatever theory underlies Bralich's
grammar.  Look at it this way: it's employment security.

                            Mike Maxwell
                            Mike_Maxwell at sil.org

---------------------------------------------------------------------------
LINGUIST List: Vol-9-395