9.396, Disc: NLP and Syntax

Wed Mar 18 22:47:36 UTC 1998

LINGUIST List:  Vol-9-396. Wed Mar 18 1998. ISSN: 1068-4875.

Subject: 9.396, Disc: NLP and Syntax

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Editors:  	    Brett Churchill <brett at linguistlist.org>
		    Martin Jacobsen <marty at linguistlist.org>
		    Elaine Halleck <elaine at linguistlist.org>
                    Anita Huang <anita at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Julie Wilson <julie at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/

Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 16 Mar 1998 13:50:45 -0500 (EST)
From:  "Samuel L. Bayer" <sam at linus.mitre.org>
Subject:  Re: 9.383: Disc: NLP and Syntax

2)
Date:  Wed, 18 Mar 1998 13:50:06 -48465656 (EST)
From:  "Mark Johnson" <mj1 at lx.cog.brown.edu>
Subject:  Re: 9.342, Disc: NLP and Syntax

3)
Date:  Wed, 18 Mar 1998 17:38:07 -0500
From:  Dan Maxwell <100101.2276 at compuserve.com>
Subject:  9.368, Disc: NLP and Syntax

-------------------------------- Message 1 -------------------------------

Date:  Mon, 16 Mar 1998 13:50:45 -0500 (EST)
From:  "Samuel L. Bayer" <sam at linus.mitre.org>
Subject:  Re: 9.383: Disc: NLP and Syntax

Phil Bralich wrote:

> Let me illustrate with some quotes from the MUC-6 web page which
> outlines the tasks to be accomplished.  To see this site yourself go
> to http://cs.nyu.edu/cs/faculty/grishman/muc6.html (I have no idea
> why no one in these discussions is providing the relevant URLs
> besides me).

> You will note in the following that there is no concern whatsoever
> for the ability to do a constituent analysis of a tree as that is
> precisely what is being avoided.  Certainly any amount of
> constituent analysis would be of value and would not be excluded but
> there seems to be an awareness that it is not available so different
> criteria are chosen.  Note also that information being extracted is
> not described in terms of phrases or clauses.

> [excerpt deleted]

[ ... and later ... ]

> However, having demonstrated with a 75,000 word dictionary and a
> parser that does a wide variety of new functions, we can no longer
> dismiss theoretical syntax.  Certainly, IE and IR will benefit
> greatly when we extend our current tools to those areas because we
> provide so much more information about the environments of the
> "named entities" and so on.

[ ... and later ... ]

> Here you are talking about doing something useful with huge numbers
> of documents of unrestricted text whereas I am speaking (primarily)
> about doing question/answer, statement/response repartee, grammar
> checking, the improvemengt of machine translation devices and of
> course about a significant, over night improvement in naviagation
> and control devices.  Nothing in the MUC standards speaks to any of
> this.  The MUC standards are actually quite narrow compared to the
> very wide realm of what is possible with NLP.

> This problem is not unknown in the field. Take a look at what is
> said in _The State of the Art in Human Language Technology_, Ron
> Cole Editor-in-Chief.  A 1996 report commissioned by the National
> Science Foundation.  In this report, Ted Briscoe (Section 3.7, p. 1)
> states, "Despite over three decades of research effort, no practical
> domain independent parser of unrestricted text has been developed."
> In addition, in that same report, Hans Uszkoreit and Anne Zaenen
> state (Section 3.1, p. 1), "Currently, no methods exist for
> efficient distributed grammar engineering [parsing].  This
> constitutes a serious bottleneck in the development of language
> technology products."

> Thus, while there is some IE and IR happening without parsers, there
> are hundreds of other possible technologies that cannot be developed
> with the standards used by MUC.  To create these other technologies
> it is necessary to meet standards just like those I have proposed or
> the bottleneck will not be broken.  In addition all IR and IE
> technologies will be significantly improved once these NLP tools are
> brought to bear on that domain.

[ ... and later ... ]

> While there is a lack of precision in some of them, I don't think it
> is at all a problem to expect a system to label tense, change active
> to passive or passive to active or to be able to answer a simple
> question. I also do not find that spectacularly undefined.

[ ... and later ... ]

> All I suggest is that the reader go to the MUC page himself (URL
> given above) and decide for himself.  The tasks that IR and IE set
> out for themselves may be of some value in that very limited domain,
> but they have absolutlely no applicability to the development of
> other NL tools such as q&a machine translation and so on.  In order
> to approach these other areas you absolutely have to have a parser.

[ ... and finally, in response to someone else ... ]

> However, I do not see how anyone could come even close to meeting
> the standards I propose without first having a fully worked out
> theory of syntax.  Even if programmers were the ones to develop a
> program that met those standards we would have to admit that
> somewhere in those lines of code was a true theory of syntax.  Even
> if you just created a huge series of gerry rigs, either they would
> not work or they would merge into a theory.  The phenomena to be
> described are complex, subtle, and intricate.  Only a completely
> worked out theory of syntax will result in such programs.

First, I welcome Dr. Bralich's more subdued tone. But I still disagree
with him, and I think that the final paragraph I've cited here
illustrates the crux of our disagreement. I agree, of course, that
there is a theory of language, no matter how simple and crude,
embedded in ANY language-processing system. There used to be people in
NLP who denied that, and claimed that their approaches were purely
statistical, but I'm pretty sure most of those folks have finally
admitted that such a goal is impossible (and counterproductive). And I
certainly agree that the MUC criteria for evaluation bear on a tiny
subset of the potential applications of language processing
systems. Dr. Bralich lists a number of other potential applications,
which I've repeated above. However, these observations do NOT imply,
together, that Dr. Bralich's standards are very useful, and the
problem is that Dr. Bralich cannot conceive of a situation in which
his desiderata do not entail his conclusions.

For instance, Dr. Bralich lists question-answering as one of the
potential applications which MUC does not address. This is
true. However, the DARPA community HAS addressed question-answering
systems in a related evaluation: the ATIS air travel reservation
domain for spoken language understanding systems. In this evaluation,
an attempt was made to define a syntactic/shallow semantic level of
evaluation; the effort was called SEMEVAL, and it failed
miserably. The problem was that there are too many different theories
of language, and settling on an intermediate level of evaluation which
embraced a particular one (and it was of course impossible to settle
on such a level without embracing a particular one) was
counterproductive and irrelevant, for a number of reasons:

(1) the point was answering the question, not parsing the
sentence. The intermediate representation was unimportant as an
evaluation criterion.

(2) there were many people who were using a simple frame-based
approach which did not use an articulated intermediate representation,
and they (rightly) observed that they would be unfairly penalized for
using an alternative approach to reaching the same goal.

(3) the range of syntactic constructions encountered in a corpus of
14,000 spontaneous utterances on air travel is pretty large, and
linguistic theories do not yet address the majority of them. In other
words, not only was the proposed evaluation level unimportant and
biased, it was also impossible to determine the "right" answer,
because we don't know it yet.

All these objections apply to Dr. Bralich's proposed criteria.

I sympathize tremendously with Dr. Bralich's goals. However, the
problem of standards for language processing systems is far more
complex than he's anticipating. There are three purposes one might
conjure up for defining such standards:

(a) it helps determine whether systems are behaving in linguistically
well-justified ways

(b) it allows one to compare systems

(c) it contributes to the determination of whether these systems can
contribute to tasks which require linguistic processing

The problem is that (a) is ill-defined, and there's no evidence that
it bears any relationship to (c), and only bears on (b) if you're
trying to evaluate a system without a task in mind (which turns out to
be nearly impossible). Dr. Bralich seems not to be able to imagine a
scenario in which (a) and (c) do not entail each other, and my point
is that the entire history of evaluation of language systems has
failed to demonstrate that they are more than peripherally connected.

[Just to clarify, I say (a) is ill-defined because theories currently
vastly underanalyze the available data, and conflict on the data that
they do analyze; resolving these conflicts MIGHT be useful for
evaluation if (a) were to imply (c) and there was no other way to get
to (c), but this has never been shown.]

Now, like Dr. Bralich, I don't believe for a second that we can get
100% of language analysis for ANY application without a detailed
theory of syntax; but (a) I don't really care which one it is (well, I
do, but not for the purposes of this discussion :-) ) and (b) we're
nowhere near 100% analysis for any task (and demonstrating that
Dr. Bralich's system matches the Penn Treebank at 100% accuracy
indicates nothing relative to this goal), and committing to
Dr. Bralich's criteria counterproductively biases the path that we
take toward this goal.

By way of conclusion, let me elaborate on this final point. Tremendous
progress has been made in computational linguistics over the last ten
years by ABANDONING the commonly-held convictions in theoretical
linguistics on how to make progress in this area. Theoretically
well-motivated systems perform no better, and in many situations
perform more poorly (both in speed and accuracy), than
pragmatically-constructed systems which fundamentally change the
assumptions about how language research is to be conducted. These
systems have a theory of language behind them; it's just a sort of
theory which theoretical linguists aren't very interested
in. Dr. Bralich's standards impose a bias from theoretical linguistics
on what systems MUST do in order to be successful; so far, this bias
has been demonstrated by the computational linguistics community to be
false and counterproductive. I personally think this result has vast
implications for linguistic theory, and I would hate to see a set of
standards adopted which effectively eliminated this alternative branch
of investigation.

Samuel Bayer
The MITRE Corporation

-------------------------------- Message 2 -------------------------------

Date:  Wed, 18 Mar 1998 13:50:06 -48465656 (EST)
From:  "Mark Johnson" <mj1 at lx.cog.brown.edu>
Subject:  Re: 9.342, Disc: NLP and Syntax

It would seem that Philip Bralich's parser is most closely related to
broad-coverage parsers like FIDDITCH (developed by Don Hindle at the
then Bell Labs), the LINK grammar developed at CMU and work by Steve
Abney (now at AT&T Labs).

As far as I know, the current best broad-coverage parsers use
statistical information, such as the ones described by Collins ``Three
Generative, Lexicalised Models for Statistical Parsing'' in the 1997
ACL conference proceedings and Charniak ``Statistical Parsing with a
Context-Free Grammar and Word Statistics'' in the 1997 AAAI conference
proceedings.

In the admittedly small world of academic research on broad-coverage
parsing, parser performance is usually evaluated by computing the
average of the labelled precision and recall scores comparing the
parser's output with the hand-constructed parse trees of a held-out
section (usually section 23) of the Penn II WSJ corpus.  This provides
a standard quantitative score of parser performance.  The authors
above claim average precision and recall scores of around 87%.  I was
wondering what average labelled precision and recall scores Philip
Bralich's parser achieves?

Mark Johnson
Cognitive and Linguistic Sciences
Brown University

-------------------------------- Message 3 -------------------------------

Date:  Wed, 18 Mar 1998 17:38:07 -0500
From:  Dan Maxwell <100101.2276 at compuserve.com>
Subject:  9.368, Disc: NLP and Syntax

I want to thank Sam Bayer for alerting us to the existence of detailed
evaluation criteria for parsers.  If every article in recent issues of
CL not only refer to these criteria, but also tell us where they can
be found found, then I agree with him in urging Phil Bralich to test
his own parser against these criteria before making such strong
claims. Since other parsers have already been tested against the MUC
criteria, this appears to provide a reasonably objective basis for
comparison.  For all I know, the Penn Treebank guidelines may be just
as good or better in some sense (it's hard to tell for sure), but I
haven't yet heard that they provide a detailed grading system like the
MUC system, nor that other parsers have been tested against them.  If
the Bracket Doctor/Ergo system really turns out to be better than the
other ones, I think many linguists would be interested in knowing more
about the underlying system.

Pius ten Hacken and Peter Menzel argue that theoretical linguistics is
right to be less concerned with data coverage than computational
linguistics, since it is more concerned with explanation in terms of
mental capacities.  It seems to me that the kind of work they are
talking about can better be considered a kind of psycholinguistics,
which by definition is concerned with the relationship between
language and the brain.  This is certainly a well established branch
of theoretical linguistics, but it is not the same thing as syntax,
which is primarily concerned with sentences (as many different kinds
as possible) and the relationships between them.  On the other hand,
ten Hacken and Menzel might prefer to join forces with functionalist
approaches to syntax, which aim to explain properties of language in
terms of historical development, ambiguity avoidance, processing, etc.

I think these are all valid approaches to language, but none of them
is the same thing as formal approaches, which to my mind have a
relationship to the rest of linguistics similar to that of mathematics
to the natural sciences --that of a useful servant, whose task is to
provide a precise model of the object of inquiry.  Given the accuracy
of this assessment, formal linguistics has the potential to be
important not only for computational implementation, but also for many
other areas, although computational implementation is still the best
way to find out if the analysis really works, at least if the
implementation is an accurate reflection of the grammar.

I don't follow Menzel's reasoning when he apparently argues that
because language functions are spread across various parts of the
brain, it is doubtful whether it is algorithmic.  I don't doubt the
premise of this argument.  Clearly, some, but perhaps not all, aspects
of language knowledge, production, and understanding are linked to
other actvities of the brain.  But what has that got to do with the
question of whether it's algorithmic?  I even looked up the word
"algorithm" to find out whether I had misunderstood the meaning of
this word.  According to my dictionary, an algorithm is just a
procedure for doing something.  I think it is clear that every healthy
human being has an algorithm for his/her own language in this sense,
even though this algorithm sometimes does not work as well as we want
it to -- we sometimes have trouble finding the words we need to
express our ideas.  But some sort of algorithm is there nevertheless..

Peter Menzel suggests that neural networks is of interest for
computational linguists trying to develop nonalgorithmic models,
though not for theoretical linguists..  But as noted above, he
believes that theoretical linguistics is concerned with the
relationship of language to the brain.  First of all, I don't see why
neural networks shouldn't be part of our language algorithm, but my
confusion on this point may be related to the discussion in the
previous paragraph.

More clearly, doesn't it follow that since neural networks is a topic
which formulates hypotheses about neurons and how they are linked
together in the brain that neural networks is of interest for
theoreticians as well?

Chomsky recently informed me by email that he would like to find a way
to use neural networks, but didn't see a way to do this.  Well, I
think others in the field, including myself, have done some fairly
detailed work in this direction.

Dan Maxwell

---------------------------------------------------------------------------
LINGUIST List: Vol-9-396