16.1505, Disc: Re: A Challenge to the Minimalist Community

Wed May 11 23:01:46 UTC 2005

LINGUIST List: Vol-16-1505. Wed May 11 2005. ISSN: 1068 - 4875.

Subject: 16.1505, Disc: Re: A Challenge to the Minimalist Community

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Dooley, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Michael Appleby <michael at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================

1)
Date: 10-May-2005
From: Carson Schutze < cschutze at ucla.edu >
Subject: Re: A Challenge to the Minimalist Community

2)
Date: 11-May-2005
From: Charles Yang < charles.yang at alum.mit.edu >
Subject: Re: A Challenge to the Minimalist Community

3)
Date: 11-May-2005
From: Anjum Saleemi < saleemi at ncnu.edu.tw >
Subject: Re: A Challenge to the Minimalist Community

-------------------------Message 1 ----------------------------------
Date: Wed, 11 May 2005 18:53:54
From: Carson Schutze < cschutze at ucla.edu >
Subject: Re: A Challenge to the Minimalist Community

I see my attempt at a simple metaphor has gone awry. And we seem
to be spiraling down into a general discussion of "How can proponents
of theory X ever show that it is right/wrong/nonvacuous etc.." Over the
years such discussions on the List have not been very fruitful, in my
opinion. But I think the Sproat & Lappin challenge raised a much more
specific point that risks getting lost.

[Sorry for consuming so much bandwidth. I don't foresee the need to
say anything further. And let me acknowledge Richard Sprout for
some off-list discussion that helped me to clarify some points; he is of
course not responsible for anything I say below.]

Emily Bender drew the following conclusion from my metaphor:
> If I've understood the point of this analogy, it is that building a system
> which can take UG and some natural language input and produce a
> grammar which can be used to assign structures to (at least the
> grammatical) strings in some corpus of language is somehow outside
> the original point of what P&P was trying to do.

No, that was not the point. The point was that trying to compare the
success of two systems (vehicles) at accomplishing a single task
(going really fast) is pretty meaningless if you totally ignore all the
other things the systems can or cannot do, e.g. support family
transportation needs (something that one of the candidates--Corvette,
was never designed to do and shows no signs of being able to do).
[Of course opinions differ on whether something shows signs of being
able to do X--see below.] This is not to say that going fast was not *a*
goal in the design of the SUV as well (does anyone ever design a
vehicle with the intent of it NOT being able to go fast? perhaps a go-
kart), it's simply that other desiderata were considered higher priorities
to worry about first (for what many of us consider principled reasons).

Just to be crystal clear (and I don't pretend to speak for all P&Pers
here): I have no objection with the suggestion that P&P might benefit
by trying to build a wide-coverage parser, or implement aspects of the
theory in some other way, or pursue proofs as to whether it is capable
of (learning to) parse. Others may have strong feelings that this would
be unproductive at this stage, I'm agnostic, that's not relevant to my
point.

My point is that the comparison, which was fairly explicit in S&L's
original posting, between P&P and statistical (and other, though they
focused on statistical) parsers doesn't make sense. Here's some text
from the challenge:

> What is particularly notable about the Klein-Manning grammar
> induction procedures is that they do what Chomsky and others
> have argued is impossible: They induce a grammar using general
> statistical methods which have few, if any, built-in assumptions
> that are specific to language.

To even debate this, we would have to establish a definition
for "grammar"; earlier in the paragraph this system is described as
inferring a "parser", which, as has been discussed, is crucially not the
same thing under usual interpretations of these terms.

The important point is the suggestion that some 'alternative(s)' to P&P
can supposedly do "what Chomsky and others have argued is
impossible ... induce a grammar". Here we have a comparison based
on a false premise, it seems to me. What is the evidence that the
Klein/Manning algorithms induce a grammar that has the properties
Chomsky argued required innate structure to learn? All we've been
told about it is that it parses some corpora at some rate less than 80%
but is "quickly converging" on that level of accuracy. No one in P&P
ever claimed that inducing the ability to parse a representative subset
of a corpus of everyday speech to a certain approximation (given POS
tags) required innate linguistic machinery. That's not the basis of any
poverty-of-the-stimulus argument. We haven't even been told whether
this statistical learner systematically distinguishes well-formed from ill-
formed novel input, a sine qua non for the sorts of systems Chomsky
is talking about.

Later on we find the following

> If the claims on behalf of P&P approaches are to be taken seriously,
> it is an obvious requirement that someone provide a computational
> learner that incorporates P&P mechanisms, and uses it to
> demonstrate learning of the grammar of a natural language.
>
> **With this in mind, we offer the following challenge to the
> community.**
>
> We challenge someone to produce, by May of 2008, a working P&P
> parser that can be trained in a supervised fashion on a standard
> treebank, such as the Penn Treebank, and perform in a range
> comparable to state-of-the-art statistical parsers.

What are we to make of "with this in mind" as a connective between
the upper (and preceding) paragraphs and the lower? The former
talks about learning a grammar of a natural language. The latter talks
about correctly parsing 90% of examples sampled from some corpus
the system was trained on. Accomplishing the very narrow parsing
task in S&L's challenge hardly tells us anything about whether some
system is or is not able to learn a natural language grammar, so if our
goal is really studying how humans acquire grammars, the challenge
is virtually irrelevant to that goal.

I suppose that someone of the S&L persuasion might sum up the
argument thus [I'm speaking purely hypothetically, following the lead of
S&L in suggesting what "the other side" might say:]

"How do humans learn and parse human language? Chomsky says
this ability relies on innate language-specific knowledge. But *we*
have statistical systems that we claim can achieve part of what
humans do, without any innate language-specific knowledge. We've
solved/are on the verge of solving (part of) the problem you said only
your approach could solve, so you'd better convince us that at the
very least you can indeed solve that problem too. Then we'll have two
promising theories that we can try out on other parts of the bigger
problem."

To show what's wrong with this, despite some trepidation I cannot
resist one final vehicular analogy.

"What makes a car work in its primary function (as a self-propelled
device)? You claim that an engine is absolutely crucial. Now we
observe that one of the properties that cars have is that if you push
them, they will roll for a while (e.g. when the battery is dead). I've built
a contraption (a little red wagon, say) that will roll for a while if you
push it. Therefore, your claim that an engine is necessary to make a
car work is now seriously in jeopardy, because my little red wagon
doesn't have an engine, and look, it rolls almost as well as a fast car,
and better than an SUV. We should explore little red wagons as
alternatives to cars."

To avoid misinterpretation:

engine = innate knowledge
roll on wheels = (learn to) approximately parse a corpus after training
on it
self-propulsion = acquiring human language
car = human: can do lots of things, of which rolling after a push is one,
and obviously not totally unrelated to its critical function of self-
propulsion, but
not one of the more difficult things to get it to do either
SUV = current-day P&P model, according to S&L, who might say it
doesn't
roll at all

    Carson

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics
                     Linguistic Theories

-------------------------Message 2 ----------------------------------
Date: Wed, 11 May 2005 18:54:16
From: Charles Yang < charles.yang at alum.mit.edu >
Subject: Re: A Challenge to the Minimalist Community

I would like to add two points to the current discussion.

First, the challenge probably has been met - and many years ago.
Broad coverage parsers based on Government Binding / Minimalism
DO exist.  The earliest commercial application I am aware of was Bob
Kuhns' GB parser that was used to summarize newswire stories in the
1980s, published at the COLING conference in 1990. A more glaring
omission is Dekang Lin's Principles & Parameters based parsers -
unambiguously dubbed PRINCIPAR and MINIPAR respectively - which
have been used in a variety of applications, and have figured
prominently in computational linguistics. For instance, for the task of
pronoun antecedent resolution, Lin's P&P-based system compared
favorably against the much larger and expensive programs at
DARPA's 6th Message Understanding Conference (MUC) in 1995.
One of the reasons for its success was the implementation of - God
forbid - the binding theory, in addition to other discourse constraints
on pronoun resolution.

MINIPAR is a parsing system based on the Minimalist formalism, and
has been around for at least 8 years: I evaluated - and recommended -
 the parser for a major computer company in the summer of 1997.
According to Lin's website,
http://www.cs.ualberta.ca/~lindek/minipar.htm, ''MINIPAR is a broad-
coverage parser for the English language. An evaluation with the
SUSANNE corpus shows that MINIPAR achieves about 88% precision
and 80% recall with respect to dependency relationships. MINIPAR is
very efficient, on a Pentium II 300 with 128MB memory, it parses about
300 words per second.'' You can even download a copy.  I suspect
that no reward is necessary: Dekang Lin is currently at Google, Inc.

My second point has to with the success of statistical parsing. In my
experience, most linguists don't give a damn about parsing, or
computers, for that matter: they are not paid to develop technologies
that may one day interest Microsoft. Yet I invite those who are in the
business of (statistical) parsing to reflect on their success. On my
view, the improvement in parsing quality over the past decade or so
has less to do with breakthroughs in machine learning, but rather with
the enrichment in the representation of syntactic structures over which
statistical induction can take place.  The early 1990s parsers using
relatively unconstrained stochastic grammars were disastrous
(Charniak 1993). By the mid 90s, notions like head and lexical
selection, both of which are tried and true ideas in linguistics, had
been incorporated in statistical parsers (de Marcken 1996, Collins
1997). The recent, and remarkable, work of Klein and Manning (2002)
takes this a step further. So far as I can tell, in the induction of a
grammatical constituent, Klein & Manning's model not only keeps track
of the constituent itself, but also its aunts and sibling(s) in the tree
structure. These additional structures is what they refer to
as ''context''; those with a more traditional linguistics training may
recall ''specifier'', ''complement'', ''c-command'', and ''government''.

If this interpretation is correct, then the rapid progress in statistical
parsing offers converging evidence that the principles and constraints
linguists have discovered are right on the mark, and if one wishes, can
be put into use for practical purposes. (And perhaps linguists deserve
a share of the far larger pot of research funds available to natural
language engineers.) This, then, would seem to be a time to rejoice
and play together, rather than driving a wedge of ''challenge'' between
the two communities.

Charles Yang
Yale University

References

Charniak, E. 1993. Statistical natural language processing.
Cambridge, MA: MIT Press.

Collins, M. 1997. Three generative, lexicalized models for statistical
parsing. ACL97, Madrid.

de Marcken, C. 1995. On the unsupervised induction of phrase
structure grammars. Proceedings of the 3rd workshop on very large
corpora. Cambridge, MA.

Klein, D & Manning, C. 2002. Natural language grammar induction
using a constituent-context model.  NIPS 2001.

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics
                     Linguistic Theories

-------------------------Message 3 ----------------------------------
Date: Wed, 11 May 2005 18:54:38
From: Anjum Saleemi < saleemi at ncnu.edu.tw >
Subject: Re: A Challenge to the Minimalist Community

Much of the recent discussion about Minimalism reminds me of a
prevalent trend witnessed many times before on the LINGUIST list in
the course of other similar discussions. As linguists we seem to be far
too much driven by some supposedly significant methodological and
computational imperatives, or even by mere notational determinants.
My recollection of most past debates of this nature is that eventually
they often deteriorate into sterile argumentation. While issues bearing
on methodology, computational tractability, and so forth, should
remain important, surely none of them can be considered to constitute
a decisive testing ground for what is or isn't a good theory.  Usually
we come to know that a theory is good only after the fact, that is, after
it has been formulated and found to be successful (and, therefore,
true). As John Framptom and others have implied in some of the
recent postings, a good parser is primarily just that: a good parser.
How exactly to anticipate the success (or otherwise) of a linguistic
theory even before it has been fleshed out is a question that's not only
unfair but misguided: if we already knew what a good theory in a
relatively unexplored domain was supposed to look like, we wouldn't
be in the business of striving for one in the first place!

In the end, the generative paradigm may indeed turn out to be wrong,
but over the decades it has provided most of the leading ideas in our
field, and has in addition helped us dig up a lot of new data. To the
extent that this is any indication of eventual success, I believe it
wouldn't be wise to let its fate be judged by any programming sleights
of hand.

Anjum Saleemi
National Chi Nan University
Taiwan

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics
                     Linguistic Theories

-----------------------------------------------------------
LINGUIST List: Vol-16-1505