[Corpora-List] Chomsky and computationnel linguistics
John A Goldsmith
goldsmith at uchicago.edu
Wed Jul 11 08:21:12 UTC 2007
Mike, I pretty much agree with you, but I think it's worth reflecting on
the proposition that not all data is alike, and some is more equal than
others. In the case of generative syntax, as it's been done over the
last nearly 50 years, most of the data that's been used, I would guess,
has been about phenomena where there would have been no problem in
finding examples in real corpora, and a search for examples in real
corpora might have suggested new insights. In a relatively small number
of areas---I label these "the parasitic gaps phenomena" in my
mind---examples are so rare that corpus search may give one too poor a
harvest, so if one thinks of these phenomena as having an importance
that offsets their rarity in data, one is forced to go the intuition
route.
What would you say to the following credo for syntacticians: using one's
one native speaker intuitions is a useful heuristic for developing
hypotheses, but these judgments should be used only as a last resort as
the basis for a serious argument (in a publication, thesis, etc.). Since
science consists both of finding elegant generalizations and showing
that they correspond well to the facts rather than what we would like
the facts to be, the linguist is obliged to submit his/her theories to
the best tests currently available. In the relatively rare case where
the issue generates only hypotheses that are difficult to test in data
that has a reality beyond the linguistic imagination of the linguist,
that lack of empirical grounding should be established and acknowledged.
I think there are serious objections to this with regard to phonology
and morphology, so I expect some with regard to syntax as well. But I
think some good arguments can be made for the position I sketched --
best,
John Goldsmith
Mike Maxwell wrote:
> Philip Resnik wrote:
>> ...there's an emerging community trying to use the
>> tools of computational linguistics with corpora to enable linguistic
>> theoreticians to be more empirical in their approach without
>> abandoning their paradigm. ...
>> Work of this kind offers value to theoretical linguists because the
>> standard paradigm of inventing and judging examples can easily miss
>> relevant facts, and lead to false generalizations.
>
> If I were doing theoretical syntax, I think this is exactly where I'd
> find myself: using corpora to keep me from missing relevant facts,
> that is, examples from people whose judgements might differ from my
> own (like my spell checker differs from me about how 'judgement'
> should be spelled, but that's another question...), or making me
> consider constructions that I might otherwise have overlooked.
>
> There is a danger in corpus linguistics of conflating dialects, using
> texts produced by non-native speakers, etc. And crucial constructions
> may simply be so rare that you just can't find them in the available
> corpus. (Crucial examples of certain kinds of reduplication in
> "exotic" languages are an example of hard-to-find data in a corpus,
> and that's not even syntax.) So there's still room, it seems to me,
> for introspection (or asking the person in the next office, or
> elicitation from an informant). But as you say, the corpora add
> value, too.
>
> (Other kinds of linguistics, like lexicography, have been about corpus
> collection for centuries, of course.)
More information about the Corpora
mailing list