Corpora: Chomsky/Harris

Michael Barlow barlow at ruf.rice.edu
Mon Apr 2 02:54:23 UTC 2001


As I remember Chomsky's writings, his initial pronouncements admitted the
usefulness of corpora, but his later (early) writings were more critical.
He was certainly critical of the notion that frequency was of interest to
linguists --- although this is actually one area in which there has been
some "convergence" in recent years. It seems to me that these days anyone
can refer to frequency as a factor in grammatical systems without being
thought of as a raving empiricist.

Below are some Sunday evening thoughts on the relation of theoretical
(i.e., generative) linguistics and corpus linguistics. I must admit that
over the last year or so I have been grappling with the problem of what
generativity means within corpus linguistics --- and I have to say that I
don't have a good answer to that question. (I think the answer has
something to do with blending (a la Fauconnier and Turner), but I have to
admit that this notion could be stretched to cover anything.) Generative
theory might be what I need, but the problem is that the word or lexical
category (with the odd fixed chunk thrown in) is taken to be the basic
combinatoric unit. This reminds me that one thing Chomsky did was shift
attention from local dependencies (and Markov models) to long-distance
dependencies, and I guess I am still stuck wondering how to capture local
dependencies in a grammatical description.

I got sidetracked. Some thoughts:

1. Chomskyan approaches:  I see a large gulf between Chomskyan theory and
Corpus Linguistics. I suppose that some might go for a division of labour
in which a UG-based system is applicable for basic (core) grammar
acquisition, while a usage-based learning system is seen as most
appropriate for peripheral grammar. Rather perversely, I think of UG
theory as a source of interesting data patterns rather than a source of
theoretical underpinnings.

2. West-coast generative approaches: In my view the richness of
description in West Coast theories (HPSG, LFG and Construction Grammar)
makes them better candidates as "corpus-ready" frameworks. Proponents of
these theories would probably say that these theories don't even need to
be extended at all. Idiom chunks such as "take advantage of" are treated
using the basic machinery of HPSG and naturally Construction Grammar can
handle constructions.

3. Langacker's Cognitive Grammar. Since this is a "maximalist",
"non-reductive", "bottom-up" approach to grammatical description, it lends
itself well to corpus approaches. I can see a corpus
linguistics theory being built on a Cognitive Grammar framework, but I
know that others disagree with this.

4. Probabilistic approaches. Rens Bod led a well-attended "workshop" on
Probability Theory in Linguistics at the last LSA meeting in Washington
DC. It will be interesting to see exactly how stochastic models are
incorporated into generative and corpus-based approaches to grammar over
the rest of the decade. I have not yet read Rens' book "Beyond Grammar"
and so I won't say anymore on this topic.

Michael
----------------------------------------------------------------------
Michael Barlow,      Department of Linguistics,       Rice University
barlow at rice.edu				      www.ruf.rice.edu/~barlow
Athelstan barlow at athel.com  www.athel.com (U.S.) www.athelstan.com (UK)



More information about the Corpora mailing list