[Corpora-List] Re: Chomsky
Ute Römer
ute.roemer at Uni-Koeln.DE
Thu Oct 14 17:34:57 UTC 2004
I think what Belén refers to is Chomsky's criticism (in Aspects of the
Theory of Syntax, 1965) of the 'defective' kind of (E-)language corpora may
contain. I quote from a recent article by Jan Aarts (entitled "Does corpus
linguistics exist? Some old and new issues", published in Anna-Brita
Stenström's festschrift, 2002?; sorry, I don't have the exact reference at
hand) which includes the Chomsky 1965 quote:
"At the same time it must be said that there is a not inconsiderable number
of utterances that one comes across in corpora but will look in vain for in
descriptive grammars of language use. Among them are broken-off sentences,
false starts, repetitions of phonemes, morphemes, words and (parts of)
larger constituents, anacolutha, stretches of text from other languages or
from sub-standard varieties, as well as utterances that the speaker or
writer intended to be ungrammatical; in short, corpora contain among other
things evidence of such grammatically irrelevant conditions as memory
limitations, distractions, shifts of attention and interest and errors ...
Chomsky 1965: 3)."
Best wishes... Ute
Just found the reference on the Rodopi website:
>From the COLTs mouth ... and others.
Language Corpora Studies. In honour of Anna-Brita Stenström.
BREIVIK, Leiv Egil and Angela HASSELGREN (Eds.)
Amsterdam/New York, NY, 2002, X, 260 pp.
_____
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Shlomo Izre'el
Sent: Thursday, October 14, 2004 6:00 PM
To: Corpora list
Subject: [Corpora-List] Re: Chomsky
I don't have the original by Leech, but here is what I have in my files:
"Any natural corpus will be skewed. Some sentences won't occur because they
are obvious, others because they are false, still others because they are
impolite. The corpus, if natural, will be so wildly skewed that the
description would be no more than a mere list."
(Chomsky in Leech, The State of the Art in Corpus Linguistics, 1991, p. 8)
Shlomo Izre'el
On Oct 14, 2004, at 4:08 PM, Bob Knippen wrote:
Mª Belén Díez Bedmar wrote:
> I'm looking for the exact bibliographical reference where we can find
> Chomsky's idea that a corpus presents a language that is defective or
> corrupted.
To my knowledge, he never says any such thing.
He does say, in several places (Syntactic Structures, 1957 comes to
mind), that corpora do not provide the kind of information about
linguistic competence that Linguistics ought to be after.
In particular, he says that corpora do not provide information about
what is ungrammmatical, and he says something to the effect that
corpora, being finite, do not shed light on the infinite generative
capacity of language. (That is, a statistical model based on a
particular corpus is not a model of the language in general).
I very much doubt he wrote that a corpus presents a language that is
defective or corrupted.
Bob
--
Bob Knippen
Computer Science Department
110 Volen Center
Mail Stop 018
Brandeis University
415 South Street
Waltham, MA 02254-9110
781-736-2745
http://www.cs.brandeis.edu/~knippen
+++++++++++++++++++++++++++++++++++++++++++
This Mail Was Scanned By Mail-seCure System
at the Tel-Aviv University CC.
_______________________________________________________
Shlomo Izre'el
Professor of Semitic Linguistics
Department of Hebrew and Semitic Languages
Webb Building #516
Tel Aviv University Home address:
POB 39040 Simtat Neve-Tsedek 7
IL-61390 Tel Aviv IL-65154 Tel Aviv
Israel Israel
Tel. +972-3-640 5016 Tel. +972-3-517 5341
Fax. +972-3-640 7031 Fax. +972-3-510 1867
+972-3-640 9457
izreel at post.tau.ac.il
http://www.tau.ac.il/humanities/semitic/izreel.html
The Corpus of Spoken Israeli Hebrew:
http://www.tau.ac.il/humanities/semitic/maamad.html (Hebrew text)
http://www.tau.ac.il/humanities/semitic/cosih.html (English text)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20041014/ca517d9e/attachment.htm>
More information about the Corpora
mailing list