[Corpora-List] Chomsky
T. Florian Jaeger
tiflo at stanford.edu
Thu Oct 14 14:49:27 UTC 2004
Hi,
I agree with Bob. On the one hand, Chomsky (at least in his early work)
sharply distinguishes between competence and performance (and any language
data belons to the performance category, including corpus data). On the
other hand, he does not say that corpus data is 'defective' or 'corrupted'.
As Bob said, corpora do not provide explicit negative evidence (although,
statistically, if we get large enough balanced corpora the likelihood that
the absence of a structure [rather than a specific string instance of that
structure] actually means that this structure does not exist in the
language increases, but arguably even current Gigaword corpora are still
quite small).
Schuetze (1996) wrote a master thesis about 'The empirical basis of
linguistics'. It contains discussions of the competence - performance
distinction as well as what kind of data is valid for which kind of
arguments. He focuses mostly on acceptability judgments but, as I recall,
the book contains quotes from Chomsky and discussion by Schuetze with
regard to corpus work as well. Another book, that touches on similar issues
(from a different angle) is Wasow (2002) "Post-verbal behavior".
Hope that helps,
Florian
At 10:08 AM 10/14/2004 -0400, Bob Knippen wrote:
>Mª Belén Díez Bedmar wrote:
>
> > I'm looking for the exact bibliographical reference where we can find
> > Chomsky's idea that a corpus presents a language that is defective or
> > corrupted.
>
>To my knowledge, he never says any such thing.
>
>He does say, in several places (Syntactic Structures, 1957 comes to
>mind), that corpora do not provide the kind of information about
>linguistic competence that Linguistics ought to be after.
>
>In particular, he says that corpora do not provide information about
>what is ungrammmatical, and he says something to the effect that
>corpora, being finite, do not shed light on the infinite generative
>capacity of language. (That is, a statistical model based on a
>particular corpus is not a model of the language in general).
>
>I very much doubt he wrote that a corpus presents a language that is
>defective or corrupted.
>
>Bob
>
>
>--
>Bob Knippen
>Computer Science Department
>110 Volen Center
>Mail Stop 018
>Brandeis University
>415 South Street
>Waltham, MA 02254-9110
>781-736-2745
>http://www.cs.brandeis.edu/~knippen
>
>
T. Florian Jaeger From 09/2004 to 12/2004
Ph.D. student Visiting Student
Linguistics Department, Department of Linguistics & Philosophy,
Stanford University, MIT,
MJH, Bldg. 460, 77 Massachusetts Avenue, 32-D808,
Stanford, CA 94305-2150, Cambridge, MA 02139,
USA USA
Phone: +1 (650) 725 2323 +1 (650) 799 2631
Fax: +1 (650) 723 5666 +1 (617) 253 5017
Email: tiflo at stanford.edu tiflo at mit.edu
Url: http://www.stanford.edu/~tiflo/
More information about the Corpora
mailing list