[Corpora-List] Chomsky

T. Florian Jaeger tiflo at stanford.edu
Thu Oct 14 14:49:27 UTC 2004


Hi,

I agree with Bob. On the one hand, Chomsky (at least in his early work) 
sharply distinguishes between competence and performance (and any language 
data belons to the performance category, including corpus data). On the 
other hand, he does not say that corpus data is 'defective' or 'corrupted'. 
As Bob said, corpora do not provide explicit negative evidence (although, 
statistically, if we get large enough balanced corpora the likelihood that 
the absence of a structure [rather than a specific string instance of that 
structure] actually means that this structure does not exist in the 
language increases, but arguably even current Gigaword corpora are still 
quite small).

Schuetze (1996) wrote a master thesis about 'The empirical basis of 
linguistics'. It contains discussions of the competence - performance 
distinction as well as what kind of data is valid for which kind of 
arguments. He focuses mostly on acceptability judgments but, as I recall, 
the book contains quotes from Chomsky and discussion by Schuetze with 
regard to corpus work as well. Another book, that touches on similar issues 
(from a different angle) is Wasow (2002) "Post-verbal behavior".

Hope that helps,

Florian

At 10:08 AM 10/14/2004 -0400, Bob Knippen wrote:


>Mª Belén Díez Bedmar wrote:
>
>  > I'm looking for the exact bibliographical reference where we can find
>  > Chomsky's idea that a corpus presents a language that is defective or
>  > corrupted.
>
>To my knowledge, he never says any such thing.
>
>He does say, in several places (Syntactic Structures, 1957 comes to
>mind), that corpora do not provide the kind of information about
>linguistic competence that Linguistics ought to be after.
>
>In particular, he says that corpora do not provide information about
>what is ungrammmatical, and he says something to the effect that
>corpora, being finite, do not shed light on the infinite generative
>capacity of language.  (That is, a statistical model based on a
>particular corpus is not a model of the language in general).
>
>I very much doubt he wrote that a corpus presents a language that is
>defective or corrupted.
>
>Bob
>
>
>--
>Bob Knippen
>Computer Science Department
>110 Volen Center
>Mail Stop 018
>Brandeis University
>415 South Street
>Waltham, MA 02254-9110
>781-736-2745
>http://www.cs.brandeis.edu/~knippen
>
>

T. Florian Jaeger                               From 09/2004 to 12/2004
Ph.D. student                                   Visiting Student
Linguistics Department,                 Department of Linguistics & Philosophy,
Stanford University,                            MIT,
MJH, Bldg. 460,                         77 Massachusetts Avenue, 32-D808,
Stanford, CA 94305-2150,                        Cambridge, MA 02139,
USA                                             USA

Phone:  +1 (650) 725 2323               +1 (650) 799 2631
Fax:            +1 (650) 723 5666               +1 (617) 253 5017
Email:          tiflo at stanford.edu              tiflo at mit.edu
Url:            http://www.stanford.edu/~tiflo/  



More information about the Corpora mailing list