Corpora: Chomsky and corpus linguistics

John A. Goldsmith ja-goldsmith at uchicago.edu
Sat Apr 28 00:57:00 UTC 2001


Ramesh wrote:
>>"What is possible" seems to require a binary
>>yes/no type of answer, "what is probable" suggests
>>a cline or spectrum. Language is a part of human
>>behaviour, and almost everything seems to be possible
>>within human behaviour.

Mike Maxwell wrote in reply:
>But that's the point of introspective grammatical judgements: they are
>binary, and *not* everything is possible.... Putting this differently,
there >really are things that are *not* possible sentences in English, even
though
>we sometimes know immediately what they would mean if they were
grammatical.  >"Whose did you find book?", "What are you afraid that
happened?", "Who do you
>wonder whether will go?" etc.  And there are things that we know aren't
>English, unless we twiddle the grammar a bit.  My favorite example is the
>sentence from Catch-22, "They disappeared him."  As one of the characters
>says in the novel says, it's not English, but...

Speaking as someone who not only believed, but _taught_, what Mike Maxwell
says, and no longer does, I would offer the remark: there is little
or no convincing evidence that there is a fundamental divide between
grammatical and ungrammatical sentences; what distinguishes those people
who believe there is such a divide from those people who do not believe
that there is such a divide, is this: just those beliefs. Beliefs, opinions,
preferences and aesthetics. And those who do not believe it feel
(but now with rational grounds) that their own
inferences and conclusions and language are more robust and less likely
to be based on faulty premises.

By the way, I think it would be a great mistake (even if we did believe
in a grammar that draw a sharp in/out distinction) to put "they disappeared
him" in the Out group! ... and thus it goes, for those who feel obliged
to make such decisions.

Mike Maxwell wrote:
>True, but a description =\= an explanation.  Generative linguistics is
trying
>to find an explanation.  Whether you believe they have (or ever will) is of
>course another question; but at least by your (Ramesh's) description,
corpus
>linguistics isn't even trying to find an explanation (unless you believe
that
>our brains are HMMs or something).

I won't speak for corpus linguistics, but I hope it is clear to all
concerned that a perfectly respectable scientific theory of language (even
possessed of the right to say that it provides an _explanation_) can be
based
on the statement that the goal of the analysis is to provide
a probability distribution over V* (where V is the vocabulary of the
language), i.e., possible strings of words. One can in turn judge
between such distributions by seeing what probabilities they assign
to actual, existing corpora: the theory that assigns the highest
probability wins. (I gloss over issues of theory description length,
of course).

John Goldsmith



More information about the Corpora mailing list