Corpora: Chomsky and corpus linguistics

Mon Apr 9 01:50:39 UTC 2001

In teaching Intro to Linguistics many of us (generative and non-generative
linguists alike) state that what linguists try to understand is "what we
know when we know a language." And during the course when students equate
language with written language we will probably emphasise that spoken
language is of primary interest to linguists.

In other words, the "science" of Linguistics (in North America at least)
is concerned with the cognitive structures associated with spoken
language.

The question arises of how to understand the nature of these cognitive
structures. We can perhaps take note of what psychologists and cognitive
scientists can tell us about cognitive structure in general. And we can
also reasonably assume that language cognitive structures are related in
some reasonably direct way to language performance; and conversely,
language performance (collected in corpora) provides some of the best
evidence we have about the nature of cognitive structures.

This approach (a version of corpus linguistics) seems to me to be as
theoretical and scientific as any other paradigm in Linguistics.

> be much more engineers than scientists.  Chomsky, OTOH, is a
> scientist.  Sometimes the scientists produce things the engineers can

Chomsky acts more like a philosopher than any regular scientist. It is
clear that he is more interested in the structure of arguments and in the
form of his theories than the relation between the theory and data.
(Actually, he is interested in the relation between theory and data, but
not in the way that a scientist is.)  I think it is uncontroversial to say
that over his long career the amount of data that is accounted for by his
successive theories has diminished, but the bonus for him has been that
the form of UG has become simpler (in some sense).

How does he approach the task of explaining the miracle of language
learning? Mike Tomasello at the Max Planck Institute in Leipsig is
collecting a massive amount of data on the input that some children
receive and on the output that they produce. Tomasello is a psychologist;
he is not going to ignore mental processes or cognitive structures and
expect to find explanations in the data alone, but he takes the
scientific stance that we need to know what kind of input the child
receives. Chomsky, on the other hand, typically argues on the basis of
logical necessity---as he sees it.

"Gross observations suffice to establish some qualitative CONCLUSIONS.
Thus, it is clear that the language each person acquires is a rich and
complex construction HOPELESSLY UNDERDETERMINED by the fragmentary
evidence available." (Reflections on Language p10)

Linguists can make their own choice as to whether the Tomasello approach
or the Chomsky approach is most likely to produce results, but I don't see
how adopting the minimal-empirical-data approach can be cast as the more
scientific of the two.

(As an aside, I should be explicit in saying that I don't think there is a
way to evaluate whether a particular research enterprise is worth
embarking on. The value of a research program can only be judged after
some research has been done and this is perhaps behind some of the
prodding of generative linguists to show what their research enterprise
has led to.)

Another typical Chomsky quote:

"Because of the sometimes intricate connections among the various
subtheories, small changes in the formulation of some principle or notion
may have large-scale and wide-ranging consequences. Such problems will
typically arise insofar as we eliminate specific rule systems in favor of
systems determined by setting parameters of UG. This is naturally a
positive development, one that is inherent in any serious effort to deepen
explanatory power, but it also means that theoretical proposals face a far
more difficult empirical challenge than in earlier work. Furthermore,
arguments become more intricate as the options for selecting rule systems
are reduced." (Barriers p2)

Mike Maxwell presumably sees this step as equivalent to Newton's step
backward. Again, I would say that it illustrates the view of Chomsky as a
philosopher. He is pursuing a line of inquiry in which those parts of UG
which are carrying a descriptive load (i.e. accounting for empirical data)
are eliminated with the hope that the empirical data can be accounted for
in different ways (i.e., by more "explanatory" systems). I don't see this
as particularly scientific. Chomsky will not be bothered by a loss of
empirical coverage because he puts great store in a particular form of
theory, one that is minimal, parsimonious and highly deductive. There is
nothing particularly scientific about his stance given that there is no
evidence to suggest that language cognitive structures are minimal,
parsimonious and highly deductive.

I believe that Chomskyan linguistics is not so much an idealisation as an
untestable theory. Each of the components comes in a variety of versions
with the result that it is impossible to judge the whole. The empirical
data used consists of grammaticality judgements, which are themselves in
need of investigation to determine what the relation is exactly between
grammaticality judgements and language cognitive structures. Finally, the
only evaluation criterion for the theory is the form of the theory itself
(It must not be too descriptive).

What if Chomsky provided the answer to the miracle of language learning?
Would that give us the theory we needed to understand what we know when we
know a language? No. We would only know how we start to learn a language.

Michael
----------------------------------------------------------------------
Michael Barlow,      Department of Linguistics,       Rice University
barlow at rice.edu				      www.ruf.rice.edu/~barlow
Athelstan barlow at athel.com  www.athel.com (U.S.) www.athelstan.com (UK)