[Corpora-List] Chomsky and computationnel linguistics

Mike Maxwell maxwell at umiacs.umd.edu
Wed Jul 11 02:39:51 UTC 2007


Philip Resnik wrote:
> ...there's an emerging community trying to use the
> tools of computational linguistics with corpora to enable linguistic
> theoreticians to be more empirical in their approach without
> abandoning their paradigm.    
> ...
> Work of this kind offers value to theoretical linguists because the
> standard paradigm of inventing and judging examples can easily miss
> relevant facts, and lead to false generalizations.

If I were doing theoretical syntax, I think this is exactly where I'd 
find myself: using corpora to keep me from missing relevant facts, that 
is, examples from people whose judgements might differ from my own (like 
my spell checker differs from me about how 'judgement' should be 
spelled, but that's another question...), or making me consider 
constructions that I might otherwise have overlooked.

There is a danger in corpus linguistics of conflating dialects, using 
texts produced by non-native speakers, etc.  And crucial constructions 
may simply be so rare that you just can't find them in the available 
corpus.  (Crucial examples of certain kinds of reduplication in "exotic" 
languages are an example of hard-to-find data in a corpus, and that's 
not even syntax.)  So there's still room, it seems to me, for 
introspection (or asking the person in the next office, or elicitation 
from an informant).  But as you say, the corpora add value, too.

(Other kinds of linguistics, like lexicography, have been about corpus 
collection for centuries, of course.)
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
	"Theorists...have merely to lock themselves in a room
	with a blackboard and coffee maker to conduct their business."
	--Bruce A. Schumm, Deep Down Things



More information about the Corpora mailing list