16.1288, Disc: Re: A Challenge to the Minimalist Community

Fri Apr 22 13:56:02 UTC 2005

LINGUIST List: Vol-16-1288. Fri Apr 22 2005. ISSN: 1068 - 4875.

Subject: 16.1288, Disc: Re: A Challenge to the Minimalist Community

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Dooley, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Michael Appleby <michael at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================

1)
Date: 20-Apr-2005
From: Carson Schutze < cschutze at ucla.edu >
Subject: Re: 16.1288251, Disc: A Challenge to the Minimalist Community

2)
Date: 22-Apr-2005
From: Chung-chieh Shan < ccshan at post.harvard.edu >
Subject: Re: 16.1288156, A Challenge to the Minimalist Community

3)
Date: 22-Apr-2005
From: Oren Sadeh-Leicht < oren.sadehleicht at let.uu.nl >
Subject: A Challenge to the Minimalist Community

-------------------------Message 1 ----------------------------------
Date: Fri, 22 Apr 2005 09:54:25
From: Carson Schutze < cschutze at ucla.edu >
Subject: Re: 16.1251, Disc: A Challenge to the Minimalist Community

Following up on Peter's point
>
> So the P&P parser that Sproat and Lappin envision would accomplish
> much more than comparable statistical parsers, which makes the
> proposed accuracy metric a poor yardstick for comparison
>

In addition to capturing the distinction between learnable and unlearnable
languages, P&P has as an important goal capturing the distinction between
well-formed (grammatical) and ill-formed (ungrammatical) sentences
within a language. As I understand it, the challenge demands only correct
parsing of grammatical sentences, not correct rejection of ungrammatical
ones. This represents another case where the P&P system, by virtue of the
goals of the theory, is being subjected to greater demands than the
statistical parsers.

Comp Ling isn't my field either, but I gather it is a desideratum for at least
some statistical parsers that they be robust in the face of noisy input,
certainly during training but perhaps also during parsing, if they are to
avoid being completely thrown off by the occasional typo or unfamiliar
word. So it strikes me as an interesting empirical question whether such
robustness, if indeed the best statistical parsers have it, hinders them from
being able to detect ungrammaticality in general. Of course humans too
can "cope with" ill-formedness of various kinds (as Sproat and Lappin
note),  but they mostly know when they are having to do so, i.e., ill-
formedness is  still detected.

So, I would like to suggest a revised version of the challenge that
incorporates a second corpus consisting of ungrammatical sentences that
are to be identified as such. (Earlier P&P parsers such as Fong's were
designed to do this, but it's not obvious that this ability will easily scale up
with broader coverage, so I don't think this is a sucker's bet.) Furthermore,
since the computationalists got to choose the corpus of good sentences, it
would seem only fair that the theoreticians get to choose the corpus of bad
sentences :-)

P.S. The statistical parsers will still be getting off easy, in my view, because
the unfamiliar sentences they *are* supposed to parse as well-formed are
drawn from the same sample as the training set. The set of novel sentences
humans [and P&P parsers, we hope] parse as grammatical arguably
includes sentence types that do not occur in the language learner's input.

--

Carson T. Schutze            Department of Linguistics, UCLA
Web: http://www.linguistics.ucla.edu/people/cschutze

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics

-------------------------Message 2 ----------------------------------
Date: Fri, 22 Apr 2005 09:54:28
From: Chung-chieh Shan < ccshan at post.harvard.edu >
Subject: Re: 16.1156, A Challenge to the Minimalist Community

In response to Richard Sproat and Shalom Lappin's challenge (16.1156),
Peter Hallman (16.1251) draws a contrast between the Principles and
Parameters (P&P) approach and statistical approaches to parsing.

    A statistical parser can, within physical limitations, recognize
    and learn any statistically significant pattern, not merely those
    patterns that occur in human languages....  The P&P framework
    seeks to answer the question

    (Q) What is a possible human language (type)?

    The P&P parser that Sproat and Lappin envision would answer this
    question; comparable statistical parsers do not.

He suspects that it would be "unrealistic" for a P&P parser to reach accuracy
comparable to current statistical parsers in three years, for two reasons.
First, as the paragraph above concludes, a P&P parser would accomplish
more than current statistical parsers.  Second, current P&P theory may not
be "ready to form the basis of a trainable parser".

I am more optimistic for P&P.  To me, these same two reasons indicate
Sproat and Lappin's challenge to be realistic rather than unrealistic.

First, a statistical parser is only hindered when it recognizes patterns that
do not occur in human languages.  The larger the space of hypotheses to
explore, the less effective machine learning can be. Conversely, many
advances in statistical parsing (going back as far as probabilistic regular
and context-free grammars) are made precisely by better
delineating "those patterns that occur in human languages", such as locality
and hierarchy.  In other words, a statistical parser embodies an
(approximate) answer to the question Q, just as a P&P parser or theory
does.  A better answer should give rise to a better parser.

Second, the attention that the P&P approach pays to language acquisition
corresponds directly to payoffs in parsing performance.  For example, a
parser whose design addresses the poverty of the stimulus should require
less training data, less supervision, or both.  Such a parser would be able to
learn from the Penn Treebank better, take advantage of vast amounts of
unlabeled corpora, or both.

In sum, a parser that better "connect[s] typological universals to the
mechanism of language learning" will fare better in accuracy, all other
things being equal.  That one linguistic theory may be more "ready" than
another for implementation reflects on not just the focus of different
communities (as Martha McGinnis points out, 16.1251), but also the
theories themselves.  Trying to answer the question Q is no excuse for poor
parsing.  All other things being equal, poor (or unknown) parsing
performance indicates failure at (resp. disinterest in) answering Q.

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics

-------------------------Message 3 ----------------------------------
Date: Fri, 22 Apr 2005 09:54:31
From: Oren Sadeh-Leicht < oren.sadehleicht at let.uu.nl >
Subject: A Challenge to the Minimalist Community

The challenge suggested by Richard Sproat is in my opinion a most
important research idea, vital to the further development and expansion of
P&P, although I share some of the worries expressed by previous writers
here.

I would like to add that the positive approach to this challenge should
be "how can P&P be made to work", and not "let's see how P&P fails to meet
its claims".

There is growing skepticism in psycholinguistic circles that P&P, though
accepted, does not deliver: It provides no practical gain in answering the
question how language is acquired (satisfying explanatory adequacy).

Moreover, the MP is considered to be too complicated, only accessible and
understood by a small isolated group of people, therefore of no practical
use, although it makes claims about explanatory adequacy. Quantum
physics is also extremely complex and difficult to understand, yet nobody
has claimed that it is of no practical use or isolated from the real world.

Generative circles have already identified the growing disparity between
P&P and psycholinguistic research. Currently, a broad research program
headed by Janet Dean Fodor et al. (CUNY) is carried out to satisfy
explanatory adequacy - to meet Sproat's challenge.

I hope that one of the researchers will post a message here, or that Richard
Sproat will post their messages on the matter, should he get any.

Cheers,
-Oren.

Linguistic Field(s): Computational Linguistics
                     Discipline of Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-16-1288