16.1439, Disc: Re: A Challenge to the Minimalist Community

Thu May 5 20:53:30 UTC 2005

LINGUIST List: Vol-16-1439. Thu May 05 2005. ISSN: 1068 - 4875.

Subject: 16.1439, Disc: Re: A Challenge to the Minimalist Community

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Michael Appleby <michael at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 05-May-2005
From: Carson Schütze < cschutze at ucla.edu >
Subject: Re: A Challenge to the Minimalist Community 

2)
Date: 05-May-2005
From: Richard Sproat < rws at xoba.com >
Subject: Re: A Challenge to the Minimalist Community 

-------------------------Message 1 ---------------------------------- 
Date: Thu, 05 May 2005 16:47:14
From: Carson Schütze < cschutze at ucla.edu >
Subject: Re: A Challenge to the Minimalist Community 

Ash Asudeh [LL 16.1364] said

> Some confusion has arisen in the subsequent discussion of the 
> Sproat-Lappin challenge. Most of the subsequent posts discuss
> statistical parsing versus P&P parsing. However, the challenge has
> nothing to do with statistical parsers per se

I would like to do some clarifying of my own. Let's see how the 
challenge was worded: [LL 16.1156]

> We challenge someone to produce, by May of 2008, a working P&P
> parser that can be trained in a supervised fashion on a standard 
> treebank, such as the Penn Treebank, and perform in a range 
> comparable to state-of-the-art statistical parsers.

So, statistical parsers were relevant as an existence proof that the 
assigned task is doable using current technology. If there were no 
systems that could parse the Treebank 90% correctly (or whatever the 
standard is), then asking P&P to do so would be a very different kind 
of challenge; Sproat and Lappin frame their challenge thus: other 
approaches have reached this milestone, we challenge you to catch 
up. From that perspective it is entirely relevant what the capabilities 
and design goals of those approaches are, compared to those of P&P. 
[Of course it is true that one could challenge P&P to do as well as 
some nonstatistical parser, in which case that would be the system 
whose capabilities/goals would be relevant. In fact one could invent a 
new challenge by simply omitting the word "statistical" from the 
original, but (a) Sproat and Lappin explicitly included it; (b) I think that 
would make it harder to establish a metric for state-of-the-art-hood, 
because it would involve apples-and-oranges comparisons, but I'm 
sure others will disagree here.]

Then Ash says about my previous point on ungrammaticality

> I don't understand the substance of this objection. All grammars, 
> those used in statistical parsing or otherwise, attempt to reject 
> ungrammatical sentences: Nobody wants their grammar/parser to 
> overgenerate. Even if the claim is true of statistical parsers (I don't
> think it is), it certainly isn't true of the LFG and HPSG parsers and
> grammars noted above.

Let me elaborate on John Goldsmith's [LL16.1432] defense of my 
point. Ash makes a claim about all grammars and about generation, 
but the challenge doesn't require the statistical parser to have a 
grammar or to generate in the relevant sense, it just requires it to map 
well-formed input strings to the "right" trees. If a grammar is defined, 
as Ash seems to assume and most would agree, as something that 
delineates all and only the well-formed expressions of a language, 
then the benchmark systems are certainly not in principle required to 
have one. If they provide an output for every possible input string, with 
no systematic distinction between the good and the bad, then by this 
definition they don't have a grammar, at most just half of one. Even if 
they did, the challenge contains nothing that would assess the set of 
strings that the grammar rules out, which is why I proposed a second 
part of the challenge to do so.

I think Ash and I agree that any "interesting" model (I won't try to 
define "interesting", but we know what we mean :-) of human 
language will include constraints against overgeneration; in those 
terms, my point was that the challenge does not require the 
benchmark system to be interesting. [Of course once again one could 
invent a different challenge that pits a P&P parser against an HPSG 
parser, where the simplest form of my objection would go away: the 
benchmark system wouldn't be ignoring an entire ability that the P&P 
system is designed to model. I still think it would be interesting to test 
in detail whether the two systems rule out the same strings, and 
whether those strings are indeed all and only the ungrammatical 
strings of the language.]

So I think we agree on the overall point that comparing a P&P parser 
to a parser that is committed (in the ways S&L outline) to the claims of 
some other linguistic theory would be more meaningful than a 
comparison with purely statistical parsers. But for those who disagree I 
would still submit that a comparison with a statistical parser would be 
more meaningful if it included a comparison of '(un)grammaticality 
judgments'.

I do want to clarify something else John Goldsmith said, however:

> There is not universal agreement to the position that the ability to 
> distinguish grammatical from ungrammatical sentences is an 
> important function to be able to model directly, whether we are 
> looking at humans or at software. There are certainly various
> serious parsing systems whose goal is to be able to parse, as best
> they can, any linguistic material that is given to them -- and
> arguably, that is what we speakers do too.

This comment unfortunately conflates two notions that I was at pains 
to keep separate in my original posting. One is the idea that a system 
will produce *some* parse for every input string you give it, including 
the ungrammatical ones, rather than *just* returning "FAIL". The other 
is the idea that a system will flag all ungrammatical inputs as 
ungrammatical, whatever else it might do with them. The first may or 
may not be an ability that humans have in full generality, and 
depending on how you think they achieve it when they do, you may or 
may not want to model it within your parser. But the second is 
something humans unquestionably *can* do for at least the massively 
vast majority of possible strings, and I therefore submit that any 
system that purports to be a model of human language ability should 
be required to do the same.

My original claim, once again, was that the challenge makes no 
requirement on this second point, but that it would be much more 
sensible if it did. Of course it also makes no requirement on the first 
point, but I did not propose expanding the challenge to incorporate it, 
for two reasons. One, which I think was John Goldsmith's main point, 
is that there is much less consensus on this as a desideratum of 
models of human parsing. The second is that there is almost no 
empirical data against which we could test statistical, P&P, HPSG or 
any other parsers with regard to how they ought to "interpret" 
ungrammatical strings. I know some people can supply some 
references, but their scope is extremely limited. If we consider one of 
the dumbest ways of generating a test corpus of ungrammatical 
sentences, namely by fully reversing the sequence of words in each of 
the Treebank sentences, I don't think anyone has a clue how people 
would interpret them (if at all).

Finally, on the general relevance of the full set of goals/capabilities of 
theories, Ash says:

> The substance of the objections are that P&P is attempting to do 
> much more than just parse sentences (Hallman) and that the goals
> of P&P are different to those of computational linguistics (McGinnis).
> I think there is merit to both these statements, but they are ultimately
> non sequiturs to the challenge. ... The requirement of capturing the 
> adult grammar also means that it's insubstantial whether the goals of
> P&P are those of computational linguistics: P&P is still expected to
> capture adult grammatical competence in the end, even if this isn't a
> *motivation* for a lot of its practitioners.

Consider the following analogy. You and I both are given the task of 
designing a motor vehicle that will get someone from point A to point 
B. You come back with a Corvette, I come back with an SUV. Now you 
say, "Let's go to a racetrack, I'll bet I can drive a circuit faster than 
you, which means I have the better design." I will of course object: 
speed was not specified as the desideratum of the vehicle. Both 
vehicles can get a person from A to B. Moreover, the SUV can do lots 
of things the 'vette can't: carry more than 2 people, hold lots of 
luggage, play DVDs for the back seat passengers, transport moderate-
sized pieces of furniture, host a small business meeting, etc. My 
motivation in designing it was to make it a multi-purpose family vehicle. 
If I were now to go back to the drafting table and modify my SUV 
design so that it keeps all its current features but can also go as fast 
as a Corvette, surely I will have achieved a much more difficult task 
than the person who just designed the Corvette.

I could have worked harder to make the analogy tighter, but the basic 
point would still go through.

    Carson

--

Prof. Carson T. Schütze      Department of Linguistics, UCLA
Web: http://www.linguistics.ucla.edu/people/cschutze 

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics

-------------------------Message 2 ---------------------------------- 
Date: Thu, 05 May 2005 16:47:29
From: Richard Sproat < rws at xoba.com >
Subject: Re: A Challenge to the Minimalist Community 

We thank the people who have responded to our challenge posted in 
16.1156, both in private and on the List. A number of the responses 
(mostly those offered in private) have been supportive. Others have 
raised issues with our challenge. In the interests of brevity, we will 
respond to the main objections rather than to individual comments:

1. It is too early to expect P&P to provide a theory that can be 
implemented as part of a large-scale parsing system that learns from 
data.

RESPONSE: This was our "Objection 3", which we characterized as 
a "remarkable dodge". Need we say more?

2. The challenge is the wrong challenge, either because:

A. We rely on the Penn Treebank as our gold standard, whereas 
there is no reason to accept the validity of the Penn Treebank 
structures; they are not even theoretically interesting.

B. Providing valid structures for sentences is not the only goal or even 
the most reasonable goal of syntactic theory: a syntactic theory should 
also provide grammaticality judgments for sentences; a syntactic 
theory should explain cross-linguistic variation.

C. Statistical approaches have it too easy since they are trained on 
data that is similar in genre to the test data.

RESPONSE: If you do not like the Penn Treebank, you are free to use 
any other reasonable corpus, and to provide your own annotations 
and representations.  The task remains the same. Show that a P&P 
acquisition system can do at least as well as statistical approaches.

Regarding B, we remind readers that humans do assign structure to 
sentences, that assigning structure to sentences is surely a part of 
what syntax is about, that humans acquire this knowledge as part of 
language acquisition, and that P&P claims to provide an explanation of 
how this is achieved. So we are at a loss to understand why inducing 
a large-scale working parser from sample data is not a valid test of 
P&P.

The claim that statistical approaches have it "too easy" will have some 
content when it is accompanied by an implemented P&P device that 
matches the performance of machine learning systems. If such a 
device cannot be constructed, it suggests not that statistical systems 
have it too easy (the same conditions have always been on offer to 
those interested in developing a large coverage P&P parser), but that 
the P&P framework is not computationally viable as a model for 
language acquisition.

3. The challenge could certainly in principle be met by P&P.

RESPONSE: "In principle" doesn't count here. Only "in fact" has any 
credibility.

4. The challenge is already being met.

RESPONSE: Oh really, where? We look forward to seeing convincing 
evidence of this.

5. Computational linguistics is about engineering rather than science. 
It may be useful for us scientists to be more aware of what is going on 
in engineering, and similarly the engineers could gain some insights 
from us scientists.

RESPONSE: It is true that computational linguistics often has 
engineering applications and that these applications often motivate 
computational linguists to address certain problems. But let's not 
confuse the issue. Many computational linguists, the two present 
authors included, are fully trained linguists who happen to be 
interested in how computational methods can yield insights on 
language. If this is not science, we do not know what is.

6. Machine learning cannot produce constraints that rule out 
ungrammatical sentences. Where the P&P seeks to characterize the 
set of possible natural languages, ML just learns syntactic patterns 
exhibited in a particular corpus.

RESPONSE: Machine learning has achieved induction of robust 
grammars that can, in fact, be turned into classifiers able to distinguish 
between acceptable and ill formed structures over large linguistic 
domains.  The fact that after more than half a century of sustained 
research the P&P enterprise and its antecedents have failed to 
produce a single broad coverage computational system for grammar 
learning suggests that its notion of Universal Grammar encoded in a 
language faculty may well be misconceived. The increasing success of 
unsupervised ML techniques in grammar acquisition lends at least 
initial plausability to the proposal that general learning and induction 
mechanisms, together with minimal assumptions concerning basic 
linguistic categories and rule hypothesis search spaces are sufficient 
to account for much (perhaps all) of the language acquisition task.

7. You should have offered a monetary prize as a financial incentive 
for meeting the challenge.

RESPONSE: We don't see why we need to pay people extra for 
demonstrating the viability of a "research program" which has 
dominated much of the field for decades, but has yet to produce 
anything approaching the results that its rivals have achieved 
efficiently in a relatively short period of time.

Finally since our challenge has actually stimulated relatively little 
discussion from the P&P community, we suspect the following may 
also be one response:

8. Ignore the challenge because it's irrelevant to the theory and 
therefore not interesting.

RESPONSE: This is the "answer" we had most anticipated. It does not 
bode well for a field when serious scientific issues are dismissed and 
dealt with through silence.

Richard Sproat
Shalom Lappin 

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics
                     Syntax

-----------------------------------------------------------
LINGUIST List: Vol-16-1439