[Corpora-List] Chomsky
Mcenery, Tony
a.mcenery at lancaster.ac.uk
Thu Oct 14 15:06:48 UTC 2004
Dear All,
Chomsky has rarely discussed corpora and has certainly never precisely said something like 'corpora contain degenerate language so I do not like them!'. However, he has certainly said things about spontaneous speech which would lead one to conclude that he would view corpora constructed of such material as representing degenerate language. For example, he claims that children are exposed to a sample of language that is "a highly degenerate sample, in the sense that much of it must be excluded as irrelevant and incorrect - thus the child learns rules of grammar that identify much of what he has heard as ill-formed, inaccurate, and inappropriate" (Chomsky, 1972, Language and Mind, Harcourt Brace, pp 170-171). Others have certainly read this as saying that spoken language (and by extension a spoken language corpus) is degenerate. For example Pateman (1982) summarises the view as "the child emerges with a grammar (or grammars) with infinite generative power after exposure to a finite and, Chomsky would say, small and often degenerate corpus of speech which is addressed to it". Hope this helps. Best,
Tony
P.S. Pateman's work is on the web, see http://www.selectedworks.co.uk/chomskypapert.html
________________________________
From: owner-corpora at lists.uib.no on behalf of Bob Knippen
Sent: Thu 14/10/2004 15:08
To: corpora
Subject: Re: [Corpora-List] Chomsky
Mª Belén Díez Bedmar wrote:
> I'm looking for the exact bibliographical reference where we can find
> Chomsky's idea that a corpus presents a language that is defective or
> corrupted.
To my knowledge, he never says any such thing.
He does say, in several places (Syntactic Structures, 1957 comes to
mind), that corpora do not provide the kind of information about
linguistic competence that Linguistics ought to be after.
In particular, he says that corpora do not provide information about
what is ungrammmatical, and he says something to the effect that
corpora, being finite, do not shed light on the infinite generative
capacity of language. (That is, a statistical model based on a
particular corpus is not a model of the language in general).
I very much doubt he wrote that a corpus presents a language that is
defective or corrupted.
Bob
--
Bob Knippen
Computer Science Department
110 Volen Center
Mail Stop 018
Brandeis University
415 South Street
Waltham, MA 02254-9110
781-736-2745
http://www.cs.brandeis.edu/~knippen
More information about the Corpora
mailing list