[Corpora-List] Chomsky and computational linguistics
John F. Sowa
sowa at bestweb.net
Wed Jul 4 17:15:03 UTC 2007
Mike,
That's a fair question:
JFS>> It is truly sad when a man who had taught us all a great deal at
>> one time long ago has walled himself off from any input that might
>> raise questions he had decided to ignore five decades ago.
MM> Let me take the devil's advocate position here, in hopes of
> provoking some discussion. What is the evidence from corpora
> that would raise these questions, and what are the questions
> Chomsky is ignoring?
Let's go back 50 years to Chomsky's _Syntactic Structures_, which
was an excellent book for its time (and the most readable book that
C. ever wrote). Following are some of the earlier developments
that he was arguing for and/or against:
1. Immediate constituent analysis (i.e., parsing by the equivalent
of context-free grammars), which was recommended by Wells (1947)
and implemented in some early MT systems.
2. Transformational grammar by Harris (1951), who had argued that
immediate constituent (IC) analysis was insufficient to capture
all the generalizations in syntax and that a transformational
layer on top of ICs was necessary to relate, for example, the
active and passive forms of verbs.
4. Grammar discovery procedures, which involved the analysis of
large (for the time) corpora in order to derive a grammar.
The most impressive work of those years was an analysis of
a corpus of fifty hours of spoken English by Fries (1952).
To minimize preconceptions, Fries avoided traditional labels
for parts of speech, and assigned meaningless letters and
numbers to the classes of words that could be inserted in the
slots of various co-occurrence patterns.
5. Statistical methods, especially those derived from Shannon (1948;
Shannon & Weaver 1949), who introduced information theory and
discussed Markov chains and the analysis of text by N-grams of
letters or words.
6. Integration of syntax and semantics, especially in MT systems.
One of the pioneering groups in MT and Comp. Ling. was the
Cambridge Language Research Unit, founded in the early 1950s
by Margaret Masterman. CLRU was a fertile breeding ground for
pioneers in both theoretical and computational linguistics.
In _Syntactic Structures_, Chomsky followed his former teacher,
Zellig Harris, in adopting transformations as a layer on top of
IC analysis. His major innovation was to adopt production rules
by Post (1943, 1947) as the notation for representing IC rules,
which C. called phrase-structure rules. As working hypotheses,
C. stated the following principles:
1. A language is a set of sentences defined by a formal grammar
stated in a mathematical notation. This assumption was not
derived from Harris, but more likely from Quine and Goodman,
who, C. said in the preface, "strongly influenced" him.
2. "Grammar is best formulated as a self-contained subject
independent of semantics" (p. 106). This assumption is
typical of the practice in formal logic, but not in the
linguistics of the 1950s. Roman Jakobson, for example,
countered "Syntax without semantics is meaningless."
3. A rejection of Markov processes as inadequate to generate
all and only the grammatical sentences of a language.
This assumption follows from the definition in point #1,
but it does not imply that Markov processes cannot be
useful for recognizing major chunks of a sentence.
4. A rejection of phrase-structure grammars as inadequate, not
because they could not generate all sentences of a language,
but because they could not express common generalizations
(such as active-passive transformations).
5. A two-level approach with a phrase-structure component for
generating kernel sentences and a transformational component
for combining and transforming the kernel sentences. This
approach is effectively equivalent to Harris's, but with
different notation and terminology.
6. "External conditions of adequacy... e.g., the sentences will
have to be acceptable to the native speaker" (p. 49).
7. "Condition of generality... we require that the grammar of a
given language be constructed in accordance with a specific
theory of structure... independently of any particular
language" (p. 50).
As guidelines, these principles led to a great deal of fruitful
research, but Chomsky fossilized them as dogma that ruled out
an even larger body of potentially much more fruitful research.
Harris, for example, would certainly object to point #6, since
he had written a grammar of Phoenician for his PhD dissertation
despite the lack of any native speakers. Point #7 also presumes
that a universally acceptable theory of structure can be edicted
even before all languages have been studied. Those two points
led to the abandonment of many projects, such as the one by Fries,
that were based on corpora. (Fortunately, linguists who worked
in the field with indigenous languages ignored those points.)
Point #1, which implies that syntax alone must determine the
set of permissible sentences, distorts everything else. Instead
of using a two-level syntax, most computational systems do the
parsing with a context-free grammar and achieve the effect of
transformations in the mapping to a semantic representation.
Theoreticians ranging from Montague to the generative semanticists
did something similar, but many of them suffered badly in the
so-called "linguistic wars."
Chomsky's rejection of statistics not only affected theoretical
linguistics, it even spread to AI and comp. ling. As an example,
a colleague of mine at IBM, Eva Mueckstein, had written a PhD
dissertation in grammar theory: she showed how a context-free
grammar combined with a finite-state control for selecting which
CF rules to apply could provide the equivalent of a context-
sensitive grammar. She was hired by Fred Jellinek, who gave
her the task of adding probabilities to the FS control. After
getting some promising results, she submitted a paper to IJCAI
in 1981. But the paper was rejected with the curt reply,
"Statistics is not AI."
In the late 1980s, another colleague, who still believed in Chomsky,
said "But we don't know much about semantics." That was thirty years
after the MT work on semantics and seventeen years after Montague.
Even worse, it was over six centuries after Ockham (1323) had written
a semantic analysis of the entire Latin language (which would, even
today, be an excellent introduction to model-theoretic semantics,
especially for linguists who are terrified by Montague's notation).
At the end of the preface, Chomsky cited the support he received from
the U.S. Army, Navy, and Air Force. Those payments were part of the
money MIT received for work on machine translation. If Chomsky had
done the work he was being paid for, he could have learned a lot about
how language actually works. Both MT and theoretical linguistics
might have benefited enormously.
John Sowa
--------------------------------------------------------------------
References:
Chomsky, Noam (1957) _Syntactic Structures_, Mouton, The Hague.
Fries, Charles Carpenter (1952) _The Structure of English_,
Harcourt, Brace & World, New York. For a tribute to Fries, see
http://itre.cis.upenn.edu/~myl/languagelog/archives/003743.html
Harris, Zellig (1951) _Methods in Structural Linguistics_, Chicago
University Press, Chicago. For a tribute to Harris, see
http://www.dmi.columbia.edu/zellig/
Masterman, Margaret (2006) _Language, Cohesion and Form_, edited
by Yorick Wilks, Cambridge University Press. For a review, see
http://www.jfsowa.com/pubs/mmb_rev.htm
Ockham, William of (1323) Summa Logicae. _Ockham's Theory of Terms_,
translation of Part I by M. J. Loux, University of Notre Dame Press,
Notre Dame, IN, 1974. _Ockham's Theory of Propositions_, translation
of Part II by A. J. Freddoso & H. Schuurman, University of Notre Dame
Press, Notre Dame, IN, 1980.
Post, Emil L. (1943) "Formal reductions of the general combinatorial
decision problem," _American Journal of Mathematics_, 65, 197-215.
Post, Emil L. (1947) "Recursive unsolvability of a problem of Thue,"
_Journal of Symbolic Logic_, 12, 1-11. Reprinted in M. Davis, ed.,
_The Undecidable_, Raven Press, Hewlett, NY, 1965, pp. 293-303.
Shannon, Claude E. (1948) "The mathematical theory of communication,"
_The Bell System Technical Journal_, Vol. 27, pp. 379–423, 623–656.
http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf
Shannon, Claude E., & Warren Weaver (1949) _The Mathematical Theory
of Communication_, Univ. of Illinois Press, Urbana.
Wells, Rulon (1947) "Immediate constituents," _Language_, 23, 81-117.
More information about the Corpora
mailing list