Corpora: Chomsky and corpus linguistics

Mcenery, Tony eiaamme at exchange.lancs.ac.uk
Sun Apr 8 15:34:49 UTC 2001


Dear All,

I have followed this thread with interest, as I am sure many have. Speaking as
what Christopher Bader has identified as a Philistine, let me try my best to
have a little snip at Samson's locks. In doing so I am putting my head above
the parapet - feel free to shoot.

> 1.  It is simply wrong to contend that Chomsky has contributed
> nothing to language technology.  His work in the 1950's and '60's
> laid part of the foundation for formal language theory.  See any
> textbook on automata and theory of computation, on the Chomsky
> Hierarchy or Chomsky Normal Form.
>
	[Mcenery, Tony]

	While in terms of early formal AI approaches to modelling language
generative theory was seen as a promising field, I would contend that the
promise was never fulfilled. The usable language technology that I know of now
owes the greatest debt to corpus based approaches to the study of language.
Christopher argues that early on Chomsky made a great - though incidental -
contribution to language technology work by laying the foundations of formal
language theories. However, this rather reminds me of those people who point
proudly at Velcro fasteners or Teflon coated pans and say 'that was developed
for the Apollo space programme!' when defending sending people to the moon,
i.e. the trip may not have been worth the price tag in itself, but look at the
side benefits! Surely there were cheaper ways to develop modern conveniences
than to spend millions of dollars sending a few Americans to the moon.
Similarly, developing a very dominant school of linguistics seems to have been
a rather heavy handed way to lay the foundation of formal language theory.

> 2.  In his more recent work, Chomsky distinguishes between
> the E-language (e.g. the set of all grammatical sentences)
> and the I-language (the human language faculty).  Generative
> grammarians study the latter; corpus linguists, the former.
> The Chomsky Hierarchy and Chomsky Normal Form are
> of course concepts pertaining to the E-language, not to
> the I-language, which is why Chomsky no longer works
> in this area.
>
	[Mcenery, Tony]
	I see no problem with the above statement, other than to say that at
times Linguistics has excluded the study of E-language (in the sense of
attested language use as opposed to the concoction of invented examples) as
being part of linguistics proper. The would be Samsons on this list have said
that corpus linguists simply misunderstand this or that view taken by
Chomsky/generativists. What they don't understand is that most corpus linguists
(I guess) on the list feel entirely misunderstood by linguists working in the
Chomskyan paradigm. Take a recent quote from Smith (Smith, N. Chomsky, Ideas
and Ideals, CUP, 2000:33) discussing concocted examples: "Appealing to examples
as complex as these often strikes non-linguists as bordering on obscuritanism:
a frequent objection is 'no one would actually say that' or 'no corpus of real
utterances would contain such examples. ' This reflects an unmotivated
preoccupation with facts about performance". The line taken by Smith - and he
claims to be reflecting the views of Chomsky - is disconcerting for a corpus
linguist. Smith continues to argue that scientists work on idealised examples
and that people using 'common sense' misunderstand the true goals of science.
In characterising linguistics in this way, Smith arguably casts corpus
linguists as non-linguists and non-scientific. Corpus linguists - not simply
'non-linguists' - would and have raised the objections Smith outlines. Corpus
linguists do not have an unmotivated preoccupation with facts about performance
- their preoccupations are often highly motivated though not, perhaps, in a
theoretical framework that Chomsky or his followers would approve of. While I
appreciate I am not quoting directly from Chomsky here, I think it is quite
relevant to point out how in the presentation of his ideas the work and worth
of corpus linguistics is often grossly misrepresented by those linguists who
work in the tradition Chomsky has established.

> Since generative linguists and computational linguists
> have fundamentally different objects of study, it is not
> surprising that they sometimes have trouble understanding
> each other's work.  I urge people on this list who are interested
> in Chomsky's actual views to read Knowledge of Language:
> Its Nature, Origin, and Use (1986).  It lays out in well-reasoned,
> non-technical prose the arguments for the E-language/I-language
> distinction.
>
	[Mcenery, Tony]
	Of course it is not just computational and generative linguists who
have different objects of study - as you note yourself linguists focusing on I
and E language also have different objects of study. Coming back to your point
about generative linguists and computational linguists having fundamentally
different objects of study, it is that realisation which has principally, in my
view, led to computational linguists lining up with corpus linguists. It was
the needs of those corpus linguists that drove language technology work in the
eighties away from what one may call cognitively plausible models of language
towards the development of systems which work in ways largely non-comparable to
human language processing. The shift to modelling language based on attested
language use rather than engaging with abstract theorising about idealised
speaker-hearer pairs was, I believe, the key to progress in natural language
processing. Beyond the language technology community, however, I would also
claim that the focus on corpus data by some linguists has also led to more
practical applications of linguistics than work conducted in the Chomskyan
paradigm ever will. I know from previous mailings and readings that Chomsky is
'hands off' about the applications of his work - if others can apply it so be
it but that is not my aim. However, it may well be the case that the theories
generated by him have few if any practical applications, though I guess the
Samsons are now going to tell me all of the practical applications of the
minimalist paradigm that there are!


	Tony



More information about the Corpora mailing list