Corpora: Chomsky and corpus linguistics

Mcenery, Tony eiaamme at exchange.lancs.ac.uk
Mon Apr 9 18:06:27 UTC 2001


Hi,

just a brief note to add to the thread before I pop off for an Easter break. I
think I will stop trading Biblical allusions - never a good idea to try that
with an SIL staffer! Instead, let me make two points (these will not respond
directly to Mike M's reply to me as Mike B has dealt with much of that).

1.) Ramesh's suggestion that corpus linguistics should define itself
independently of Chomskyan linguistics. I think that this is possible -
certainly when one is really aiming at an entirely different theory of
language, e.g. those who work in a Firthian tradition. However, I guess I am
not the only corpus linguist who does not work in that tradition, and I have an
interest in engaging with Chomskyan linguistics as it is a very dominant
paradigm in linguistics. Hence my interest in defining corpus linguistics with
respect to that paradigm.

2.) I thought I would add a note on my position regards Chomsky and the nature
of data in linguistics. I think Chomsky is - rather ironically - a key figure
in the development of corpus linguistics. Criticisms of corpus data arising
from the tradition he established has led to ideas such as balance and
representativeness being employed to counter such criticisms as skew. So
Chomsky moulded at least some of the current approaches to the construction and
use of corpora.

With reference to how corpus linguistics and intuition based linguistics can
interface, I think Charles Fillmore got this spot on in his Nobel Symposium
paper - the two can interact very gainfully. For example, when one needs a
large scale description of a language with an emphasis on what is and is not
typical in that language a corpus is a good tool to use and linguistic
intuition probably is not. When one wants to test a theoretical model to
destruction a corpus can also be a good tool. But beyond the finite corpus I
may also want to think of sentences which represent a challenge to my theory.
These sentences have not occurred in the corpus but are in my view possible
sentences of the language. Especially in the case where one is testing a theory
with admittedly extreme examples I have no problem with convoluted 'invented'
examples. While I do usually get as sniffy as most other corpus linguists about
invented examples, there have been times in my own work where an invented
example has had to be used under precisely these circumstances (old stuff on
theories of pragmatics). Where I do have a problem with the use of such
examples is where they are used in preference to a corpus where a corpus should
clearly be used. Maybe this is a 'common sense' response to a fundamental
problem in linguistics, but it seems to have real merit to me.

Tony



More information about the Corpora mailing list