[Corpora-List] corpora & chomsky

Florian Jaeger tiflo at csli.stanford.edu
Thu Oct 14 14:54:20 UTC 2004


Hi,

I agree with Bob. On the one hand, Chomsky (at least in his early work)
sharply distinguishes between competence and performance (and any language
data belons to the performance category, including corpus data). On the
other hand, he does not say that corpus data is 'defective' or
'corrupted'. As Bob said, corpora do not provide explicit negative
evidence (although, statistically, if we get large enough balanced corpora
the likelihood that the absence of a structure [rather than a specific
string instance of that structure] actually means that this structure does
not exist in the language increases, but arguably even current Gigaword
corpora are still quite small).

Schuetze (1996) wrote a master thesis about 'The empirical basis of
linguistics'. It contains discussions of the competence - performance
distinction as well as what kind of data is valid for which kind of
arguments. He focuses mostly on acceptability judgments but, as I recall,
the book contains quotes from Chomsky and discussion by Schuetze with
regard to corpus work as well. Another book, that touches on similar
issues (from a different angle) is Wasow (2002) "Post-verbal behavior".

Hope that helps,

Florian



More information about the Corpora mailing list