Corpora: Chomsky/Harris

Stefan Th. Gries StThGries at t-online.de
Mon Apr 2 09:14:25 UTC 2001


I would agree with most of what Michael wrote in his recent posting,
especially when it comes to the relation between Chomskyan linguistics and
corpus-based approaches. While I would not consider them the enemy at MIT, I
nevertheless believe that Chomskyan linguistics and much what I see as
corpus linguistics are widely disparate domains that differ in many issues
defining scientific reseach agendas:
- the issue of what counts as data;
- the issue of how data are analysed;
- the range of questions that is considered of being liable to fruitful
investigation;
- the range of questions actually being investigated.
Put differently, I think many, if not most, corpus linguistics investigate
completely different things on the basis of completely different data and
methods of analysis. I am not sure whether this bridge is impossible to gap
or not, but I am sure that, for a large number of research questions, each
side has little to offer for the other.
    To give one example, I recently attended a predominantly generative
conference (admitting me there shows that there need not be any such enmity
as hinted at in previous postings!) and gave a talk on a corpus-based
approach to the influence of processing on Preposition Stranding (PS) in
English, PS also being at least touched upon by other presenters. While I of
course do not claim that my work can represent corpus linguistics as a
whole, the discrepancies between the corpus-based approach and the other,
generative, analyses were obvious (and also mentioned in some comments from
the audience):
- my analysis was corpus-based, although acceptability judgements on the
part of the investigating linguist could perhaps also have provided similar
answers (which I find highly unlikely, given my multifactorial/statistical
approach);
- my analysis was only concerned with performance;
- my analysis did not focus on any structural representation (not to say,
tree diagram) of the phenomenon under consideration that was compatible with
recent/contemporary analyses.
On the other hand, from a corpus-linguistic point of view at least, it was
difficult for me to value the generative analyses presented - not because
they were not sound in the framework presented, but because they asked
questions completely different from mine and, e.g., exhibited a range of
(from my point of view) acceptability judgements of sentences that were, in
the case of two presenters, re-interpreted after every question from the
audience.
    To cut a long story short, what is it that we have to offer to
generative theory if we frequently talk about frequencies, probabilistic
processes etc. - and what does generative theory have to offer to us, given
the steadily increasing number of empty nodes and functional categories and
the neglect of the linguistic material that is actual being produced?



More information about the Corpora mailing list