[Corpora-List] Chomsky and computational linguistics

John F. Sowa sowa at bestweb.net
Wed Jul 4 17:15:03 UTC 2007


Mike,

That's a fair question:

JFS>> It is truly sad when a man who had taught us all a great deal at
 >> one time long ago has walled himself off from any input that might
 >> raise questions he had decided to ignore five decades ago.

MM> Let me take the devil's advocate position here, in hopes of
 > provoking some discussion.  What is the evidence from corpora
 > that would raise these questions, and what are the questions
 > Chomsky is ignoring?

Let's go back 50 years to Chomsky's _Syntactic Structures_, which
was an excellent book for its time (and the most readable book that
C. ever wrote).  Following are some of the earlier developments
that he was arguing for and/or against:

  1. Immediate constituent analysis (i.e., parsing by the equivalent
     of context-free grammars), which was recommended by Wells (1947)
     and implemented in some early MT systems.

  2. Transformational grammar by Harris (1951), who had argued that
     immediate constituent (IC) analysis was insufficient to capture
     all the generalizations in syntax and that a transformational
     layer on top of ICs was necessary to relate, for example, the
     active and passive forms of verbs.

  4. Grammar discovery procedures, which involved the analysis of
     large (for the time) corpora in order to derive a grammar.
     The most impressive work of those years was an analysis of
     a corpus of fifty hours of spoken English by Fries (1952).
     To minimize preconceptions, Fries avoided traditional labels
     for parts of speech, and assigned meaningless letters and
     numbers to the classes of words that could be inserted in the
     slots of various co-occurrence patterns.

  5. Statistical methods, especially those derived from Shannon (1948;
     Shannon & Weaver 1949), who introduced information theory and
     discussed Markov chains and the analysis of text by N-grams of
     letters or words.

  6. Integration of syntax and semantics, especially in MT systems.
     One of the pioneering groups in MT and Comp. Ling. was the
     Cambridge Language Research Unit, founded in the early 1950s
     by Margaret Masterman.  CLRU was a fertile breeding ground for
     pioneers in both theoretical and computational linguistics.

In _Syntactic Structures_, Chomsky followed his former teacher,
Zellig Harris, in adopting transformations as a layer on top of
IC analysis.  His major innovation was to adopt production rules
by Post (1943, 1947) as the notation for representing IC rules,
which C. called phrase-structure rules.  As working hypotheses,
C. stated the following principles:

  1. A language is a set of sentences defined by a formal grammar
     stated in a mathematical notation.  This assumption was not
     derived from Harris, but more likely from Quine and Goodman,
     who, C. said in the preface, "strongly influenced" him.

  2. "Grammar is best formulated as a self-contained subject
     independent of semantics" (p. 106).  This assumption is
     typical of the practice in formal logic, but not in the
     linguistics of the 1950s.  Roman Jakobson, for example,
     countered "Syntax without semantics is meaningless."

  3. A rejection of Markov processes as inadequate to generate
     all and only the grammatical sentences of a language.
     This assumption follows from the definition in point #1,
     but it does not imply that Markov processes cannot be
     useful for recognizing major chunks of a sentence.

  4. A rejection of phrase-structure grammars as inadequate, not
     because they could not generate all sentences of a language,
     but because they could not express common generalizations
     (such as active-passive transformations).

  5. A two-level approach with a phrase-structure component for
     generating kernel sentences and a transformational component
     for combining and transforming the kernel sentences.  This
     approach is effectively equivalent to Harris's, but with
     different notation and terminology.

  6. "External conditions of adequacy... e.g., the sentences will
     have to be acceptable to the native speaker" (p. 49).

  7. "Condition of generality... we require that the grammar of a
     given language be constructed in accordance with a specific
     theory of structure... independently of any particular
     language" (p. 50).

As guidelines, these principles led to a great deal of fruitful
research, but Chomsky fossilized them as dogma that ruled out
an even larger body of potentially much more fruitful research.

Harris, for example, would certainly object to point #6, since
he had written a grammar of Phoenician for his PhD dissertation
despite the lack of any native speakers.  Point #7 also presumes
that a universally acceptable theory of structure can be edicted
even before all languages have been studied.  Those two points
led to the abandonment of many projects, such as the one by Fries,
that were based on corpora.  (Fortunately, linguists who worked
in the field with indigenous languages ignored those points.)

Point #1, which implies that syntax alone must determine the
set of permissible sentences, distorts everything else.  Instead
of using a two-level syntax, most computational systems do the
parsing with a context-free grammar and achieve the effect of
transformations in the mapping to a semantic representation.
Theoreticians ranging from Montague to the generative semanticists
did something similar, but many of them suffered badly in the
so-called "linguistic wars."

Chomsky's rejection of statistics not only affected theoretical
linguistics, it even spread to AI and comp. ling.  As an example,
a colleague of mine at IBM, Eva Mueckstein, had written a PhD
dissertation in grammar theory:  she showed how a context-free
grammar combined with a finite-state control for selecting which
CF rules to apply could provide the equivalent of a context-
sensitive grammar.  She was hired by Fred Jellinek, who gave
her the task of adding probabilities to the FS control.  After
getting some promising results, she submitted a paper to IJCAI
in 1981.  But the paper was rejected with the curt reply,
"Statistics is not AI."

In the late 1980s, another colleague, who still believed in Chomsky,
said "But we don't know much about semantics."  That was thirty years
after the MT work on semantics and seventeen years after Montague.
Even worse, it was over six centuries after Ockham (1323) had written
a semantic analysis of the entire Latin language (which would, even
today, be an excellent introduction to model-theoretic semantics,
especially for linguists who are terrified by Montague's notation).

At the end of the preface, Chomsky cited the support he received from
the U.S. Army, Navy, and Air Force.  Those payments were part of the
money MIT received for work on machine translation.  If Chomsky had
done the work he was being paid for, he could have learned a lot about
how language actually works.  Both MT and theoretical linguistics
might have benefited enormously.

John Sowa

--------------------------------------------------------------------

References:

Chomsky, Noam (1957) _Syntactic Structures_, Mouton, The Hague.

Fries, Charles Carpenter (1952) _The Structure of English_,
Harcourt, Brace & World, New York.  For a tribute to Fries, see
http://itre.cis.upenn.edu/~myl/languagelog/archives/003743.html

Harris, Zellig (1951) _Methods in Structural Linguistics_, Chicago
University Press, Chicago.  For a tribute to Harris, see
http://www.dmi.columbia.edu/zellig/

Masterman, Margaret (2006) _Language, Cohesion and Form_, edited
by Yorick Wilks, Cambridge University Press.  For a review, see
http://www.jfsowa.com/pubs/mmb_rev.htm

Ockham, William of (1323) Summa Logicae. _Ockham's Theory of Terms_,
translation of Part I by M. J. Loux, University of Notre Dame Press, 
Notre Dame, IN, 1974. _Ockham's Theory of Propositions_, translation
of Part II by A. J. Freddoso & H. Schuurman, University of Notre Dame 
Press, Notre Dame, IN, 1980.

Post, Emil L. (1943) "Formal reductions of the general combinatorial
decision problem," _American Journal of Mathematics_, 65, 197-215.

Post, Emil L. (1947) "Recursive unsolvability of a problem of Thue,"
_Journal of Symbolic Logic_, 12, 1-11.  Reprinted in M. Davis, ed.,
_The Undecidable_, Raven Press, Hewlett, NY, 1965, pp. 293-303.

Shannon, Claude E. (1948) "The mathematical theory of communication,"
_The Bell System Technical Journal_, Vol. 27, pp. 379–423, 623–656.
http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

Shannon, Claude E., & Warren Weaver (1949) _The Mathematical Theory
of Communication_, Univ. of Illinois Press, Urbana.

Wells, Rulon (1947) "Immediate constituents," _Language_, 23, 81-117.



More information about the Corpora mailing list