[Corpora-List] corpus linguistics

John F. Sowa sowa at bestweb.net
Sun Sep 16 03:54:20 UTC 2007


Charles and Mike,

CM> I heard Chomsky speak a couple of years ago and comment that
 > people who analyze corpora were engaging in the equivalent of
 > pre-Galilean physics.  He also stated that real data was so
 > noisy that it should basically be ignored...

That is proof that Chomsky has not paid attention to how real
science is done.  Very few areas of science have the luxury of
being able to control conditions as precisely as Galileo was said
to have done (and there is speculation that Galileo "cheated").

Organic chemistry, for example, is notorious for being the
science of "side effects".  Astronomy has made enormous progress
by analyzing "noisy data", since it's impossible to control the
starting conditions for the universe.  Biologists have a saying
that any organism from a bacterium to a gorilla, under controlled
conditions, "will do what it damn well pleases".

Since linguistics studies the behavior of organisms that "do what
they damn well please", noisy data is the only *real* data.

MM> ... there are lots of generative theories.  Besides the various
 > incarnations of Chomskian theories, there are LFG, GPSG (since
 > superseded by HPSG, which is surely a generative theory), and
 > probably several other theories, the proponents at least of which
 > would call themselves generativists.

All those theories have been influenced by ideas introduced by
Chomsky, but there is no reason why the people who test them
should follow Chomsky's advice about picking and choosing data.

Analyzing the "intuition" of a native speaker is what physicists
call a "Gedanken experiment".  That's an excellent way to formulate
hypotheses, but empirical science must take the next step of
reformulating the elegant imaginary experiments into methods
that produce and analyze very noisy real data.

John Sowa



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list