Corpora: Chomsky and corpus linguistics

ramesh at clg.bham.ac.uk ramesh at clg.bham.ac.uk
Fri Apr 27 23:46:44 UTC 2001


Mike Maxwell writes:
>Putting this differently, there really are things that are *not*
>possible sentences in English, even though we sometimes know immediately
>what they would mean if they were grammatical.

You change the criterion from "possible" in the first clause to
"grammatical" in the second one. Many instances in corpus data
may be ungrammatical, but are necessarily possible, because they have
actually occurred. But the universes of grammaticality and
possibility do not share the same circumference.

Mike Maxwell writes:
>And there are things that we know aren't English, unless
>we twiddle the grammar a bit. My favorite example is the sentence from
>Catch-22, "They disappeared him."  As one of the characters says in the
>novel says, it's not English, but...

So now, "English" = "grammar".
It is English, it's the grammar that is inadequate.

cf. Michael Halliday (1993):
"The Chomskyan position on induction is closely related to the langue-parole
and competence-performance distinctions. But what such frequency data make
very clear is the ultimate inseparability of system and use."

cf. Robert de Beaugrande: Large Corpus Linguistics and Applied Linguistics:
Dedicating new Bridges:
"So to discover the `deeper' or `underlying' order of language (called
`langue', `competence', `deep structure', `I-language', etc.) linguistics
should *take it back out of use* (called `langage/parole', `performance',
`surface structure', `E-language', etc.). Yet doing so in effect tends to
*replace language* with *ideal language* which exists nowhere except in
some `linguistic theory', although it is boldly offered as an `explanation'
of `language' in general and of much else besides, such as `human language
acquisition' (Beaugrande 1997b, 1998a and 1998b)."
[see http://www.beaugrande.com/]

Mike Maxwell writes:
>True, but a description =3D\=3D an explanation.  Generative linguistics
>is trying to find an explanation.  Whether you believe they have (or
>ever will) is of course another question; but at least by your
>(Ramesh's) description, corpus linguistics isn't even trying to find an
>explanation (unless you believe that our brains are HMMs or something).

Sorry, something got lost in the email system at this point. I don't know
what "=3D\=3D" was meant to be...
Anyway, I did not say that corpus linguistics "isn't even trying to find
an explanation". Surely description (or at least a methodology/apparatus for
description) has to precede (a methodology/apparatus for) explanation ?
And the better the description, the more robust the explanation can be.

A bottom-up methodology will necessarily take longer to arrive at
high-level abstractions, whereas a top-down methodology starts with them.
This is why it is easiest to criticize top-down methods by criticizing
the examples they choose to work with.

The explanation which corpus linguistics eventually arrives at will have to
include factors beyond formal grammar, and even non-linguistic factors, as
these affect the situations and contexts in which the corpus instances were
produced.

You seem to want to squeeze language until it conforms to your grammar,
rejecting any instances of language that cannot be so squeezed by calling
them "ungrammatical" or worse still "not English",
whereas I want to describe language in terms of grammar and other
linguistic systems, amending the systems wherever necessary so that they
can include the vast majority of the corpus data (the actual and the
probable which it predicts), but allowing that some small proportion of the
data may remain beyond the descriptive or explanatory scope/power of these
systems.


Best
Ramesh

Ramesh Krishnamurthy
Consultant, COBUILD, Collins Dictionaries and Bank of English corpus
Honorary Research Fellow, University of Birmingham
Honorary Research Fellow, University of Wolverhampton



More information about the Corpora mailing list