[Corpora-List] Chomsky

Mcenery, Tony a.mcenery at lancaster.ac.uk
Thu Oct 14 15:06:48 UTC 2004


Dear All,
 
Chomsky has rarely discussed corpora and has certainly never precisely said something like 'corpora contain degenerate language so I do not like them!'. However, he has certainly said things about spontaneous speech which would lead one to conclude that he would view corpora constructed of such material as representing degenerate language. For example, he claims that children are exposed to a sample of language that is "a highly degenerate sample, in the sense that much of it must be excluded as irrelevant and incorrect - thus the child learns rules of grammar that identify much of what he has heard as ill-formed, inaccurate, and inappropriate" (Chomsky, 1972, Language and Mind, Harcourt Brace, pp 170-171). Others have certainly read this as saying that spoken language (and by extension a spoken language corpus) is degenerate. For example Pateman (1982) summarises the view as "the child emerges with a grammar (or grammars) with infinite generative power after exposure to a finite and, Chomsky would say, small and often degenerate corpus of speech which is addressed to it".  Hope this helps. Best,
 
Tony
 
P.S. Pateman's work is on the web, see http://www.selectedworks.co.uk/chomskypapert.html

________________________________

From: owner-corpora at lists.uib.no on behalf of Bob Knippen
Sent: Thu 14/10/2004 15:08
To: corpora
Subject: Re: [Corpora-List] Chomsky





Mª Belén Díez Bedmar wrote:

  > I'm looking for the exact bibliographical reference where we can find
  > Chomsky's idea that a corpus presents a language that is defective or
  > corrupted.

To my knowledge, he never says any such thing.

He does say, in several places (Syntactic Structures, 1957 comes to
mind), that corpora do not provide the kind of information about
linguistic competence that Linguistics ought to be after.

In particular, he says that corpora do not provide information about
what is ungrammmatical, and he says something to the effect that
corpora, being finite, do not shed light on the infinite generative
capacity of language.  (That is, a statistical model based on a
particular corpus is not a model of the language in general).

I very much doubt he wrote that a corpus presents a language that is
defective or corrupted.

Bob


--
Bob Knippen                            
Computer Science Department
110 Volen Center
Mail Stop 018
Brandeis University    
415 South Street                       
Waltham, MA 02254-9110                 
781-736-2745                                   
http://www.cs.brandeis.edu/~knippen



More information about the Corpora mailing list