Response to Joybrato Mukherjee regarding
"The Cambridge Grammar of the English Language"
by Rodney Huddleston and Geoffrey K. Pullum

Joybrato Mukherjee's recent Book Discussion Forum posting
(LINGUIST 13.1853, July 4, 2002) draws an unfavorable comparison
between "The Cambridge Grammar of the English Language" (Huddleston
and Pullum 2002; hereafter CGEL) and two grammars (closely related to
each other) that Mukherjee prefers: Quirk et al. (1985; henceforth
Quirk) and Biber et al.  (1999; henceforth Biber).  He criticizes CGEL
for not being corpus-based, and for adopting analyses on grounds of
dogma rather than evidence. But it is Mukherjee who fails to show
respect for textual evidence.  He makes misattributions with respect
to all three of the grammars he discusses, and fails to check facts
before delivering opinions.  Essentially all his negative criticisms
of CGEL rest on false claims.  Here we offer a brief response to half
a dozen especially egregious ones.

1. Binary branching.
Mukherjee announces that CGEL analyses "take for granted that syntactic
constituent structure should be represented by strictly binary-branching
(or, in some cases, singulary-branching) trees."  He thinks this is "but
one example of the influence that generative concepts have obviously
exerted on the Cambridge Grammar."  But even a cursory glance at the 40
tree diagrams in CGEL would show that the claim about our alleged binarism
is false (a full list of trees is provided on p. xiii, so he could easily
have checked).  Multiple branching is visible in the coordination example
on p. 1279.  And the discussion in chapter 4 of phrases with two or more
complements, such as "give her this", makes it clear that we assume
ternary branching.  Binary branching would make "this" the complement of
"give her" rather than of "give" -- or else would assign "give" a single
complement, a phrase of the form "her this".  CGEL adopts many insights
from generative grammatical research, but sides with Jackendoff (1990)
against much other current generative work in rejecting both kinds of
binary analysis.  CGEL maintains that in "give her this", the NPs "her"
and "this" are both complements of (and sisters of) "give".

2. The subject-predicate division.  Mukherjee thinks CGEL's acceptance
of the NP-VP (subject-predicate) analysis of clauses stems from the
authors' bigoted binarism, and he also thinks that Quirk rejects the
binary analysis.  Both claims are mistaken.  Quirk does make a binary
division in clause structure between subject and predicate. True, a remark
is made to the effect that "we shall find little need to refer to the
predicate as a separate structural unit", but immediately after that
comes a note about a case where it has real significance (see Quirk,
p. 79).  CGEL's acceptance of the subject--predicate division certainly
has nothing to do with acceptance of results from generative grammar;
the subject--predicate division is familiar from traditional grammar,
as Quirk notes (p. 78).

3. Multiple analyses.  Mukherjee alleges that CGEL does not allow for
multiple analyses. He prefers Quirk's treatment of "She looked after her
son" as both Subject - Verb - "Adverbial" (She - looked - after her son)
and Subject - Verb - Object (She - looked after - her son). This criticism
is another double error.  CGEL provides detailed syntactic evidence in
favour of the former bracketing (though with "after her son" as a
complement, not an adjunct (Quirk's ill-advised term is "adverbial") --
a point on which we believe Quirk got things wrong).  Mukherjee gives
no reason for wanting to allow the second as well.  And there is solid
evidence against it: it would predict the possibility of postposing a
heavy NP object, yielding:

    *She looked after all morning the children from several other
     families in her street.

But CGEL does allow multiple analyses where appropriate. In the
construction "Bob is as generous as Sue", the complement of "as" may be
either an NP or an elliptical clause consisting of nothing but a
subject NP. This differs from the prepositional verb case in that there
is no compelling syntactic evidence to choose one analysis over the
other. Thus CGEL uses evidence to distinguish between situations in
which a constituent structure claim is motivated and cases where it is
not.  Quirk fails to do this.  Mukherjee has things backwards.

4. Corpus use.  Mukherjee sees it as a "major weakness of the Cambridge
Grammar" that it is not more corpus-based. He asks: "Can a reference
grammar of the English language, published in the year 2002, really be
based on corpus material containing three million words only? I would
say no."  He says no, but he does not say why. The fact is that CGEL
was not based on three million words. Its range of sources was vast:
the authors' lifelong experience of the English language; the similar
experience possessed by a dozen other native-speaker collaborating
authors; further evidence pointed out by others; facts cited in hundreds
of technical articles and books; the large grammars of Poutsma, Jespersen,
Quirk, and other large grammars; the Oxford English Dictionary; various
collections of texts that we happened to have on computer, including the
44 million words of the Wall Street Journal corpus (WSJ); and where
necessary the World Wide Web.  (The British National Corpus became
available to us only when CGEL was almost complete.)

It is true that for examples we standardly mined the Brown corpus for
American English, the London-Oslo-Bergen corpus for British English,
and the Australian Corpus of English for Australian English (we had
convenient interactive access to these through the courtesy of
Macquarie University), and these total three million words. But these
corpora were merely sources of illustrative examples, nearly always
edited for expository reasons.  (It is one of the errors of strictly
corpus-oriented grammars to use only raw attested data for purposes of
illustration. We think it is counterproductive to quote a sentence with a
subject NP containing a long and distracting relative clause when all we
are concerned to illustrate is the order of adjuncts in the verb phrase.)

Mukherjee maintains the peculiar view that WSJ is not a corpus at all.
He says: "the Wall Street Journal, in my view, does not qualify as a
representative 'corpus' but is an example of a linguistically
unstructured 'archive' (which may be used as a source of authentic
examples but from which general trends in language cannot be
extrapolated)."  What improvement results in a descriptive grammar
if we rigorously restrict our attention some "representative" corpus is
not made clear. Mukherjee may be confusing the purpose of a descriptive
reference grammar with the aim of statistical studies of frequencies of
specific words or constructions across genres, dialects, or times (Biber
specializes in providing this sort of information). We were not attempting
a survey of trends or genre differences; we were writing a grammar of
international Standard English.

5. Extraposition.  The one way to show that a descriptive grammar failed
by not being corpus-based enough would be to point out something that was
missed because of a failure to attend to corpus evidence.  Mukherjee's
only attempt at making a point like this concerns extraposition. He
asserts that Quirk and Biber, taken together, give a more convincing
account of English than CGEL does. Mukherjee cites B's corpus-restricted
study based on a 40-million-word collection of texts (which follows Quirk
on most points of syntactic analysis) in support of the claim that
extraposition constructions like "It would be pointless to resist" are
more basic than non-extraposition clausal-subject counterparts like "To
resist would be pointless".

Now, let there be no disagreement about frequency: CGEL states clearly
that the extraposition construction is much more frequent. (No need of a
40-million-word corpus to establish this, incidentally.  Huddleston 1971
had no Brown, LOB, ACE, or computer, and worked with a corpus of only
about 135,000 words.  He found 3 examples of the clausal subject
construction to 89 with extraposition, certainly a dramatic enough
difference to establish the conclusion.)  However, CGEL goes on to
explain, in the paragraph right after the page Mukherjee cites (p. 1403),
why the non-extraposition construction is analytically more basic: it is
syntactically simpler, and has a structure that is normally the only one
available for NP subjects.

Mukherjee cites the CGEL distinction between canonical and non-canonical
structures as one of the book's "many examples of refreshingly innovative
concepts and/or terminology"; but he apparently does not see its relevance
here. Non-extraposition clauses exemplify canonical clause structure and
are in that sense more basic.  The canonical vs non-canonical distinction
permits a simplified grammar presentation: we first confine attention to
elementary constructions and then deal with others in terms of how they
differ. We hold (contrary to what is implicit in many generative accounts)
that it would be a mistake to include extraposed subjects among the
elements figuring in the structure of canonical clauses.

When we turn to Biber we find that, contrary to what Mukherjee suggests,
there is no substantive difference with CGEL anyway: Biber's section 3.5
(pp. 141-52) deals with "major clause patterns", while extraposition is
introduced in section 3.6, headed "Variations on clause patterns" and
beginning, "In addition to the basic clause patterns...". Mukherjee has
failed to notice that the grammar he prefers takes the same analytical
view as CGEL.

6. Conclusion.  Essentially all of Mukherjee's critical comments about
CGEL stem from factual errors about the book or about other books.  This
is true not only for points of analysis, but for simple points about
presentation that anyone could check.  For example, he complains that
"only very few tables and diagrams are used" in CGEL relative to Quirk.
We have not done a full comparative listing of all tables and diagrams
(it is unclear where to draw the line between tables and mere columned
displays), but we checked two comparable chapters for trees: there are
just 4 tree diagrams (two merely skeletal) in Quirk's chapter on
coordination (Ch. 13), whereas CGEL's corresponding chapter (Ch. 15)
contains 16 fully detailed trees. We believe these chapters are
representative.  Mukherjee has not done the homework to back up his
critique even on simple counting such as this.

The reader of Mukherjee's review should be cautioned, therefore, that he
does not practice his quantitative preaching.  He talks the corpus-based
talk, but when elaborating his impressionistic comparison of CGEL with
Quirk and Biber, he does not walk the walk.


Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward
Finegan (1999): Longman Grammar of Spoken and Written English. Harlow:
Pearson Education.

Huddleston, Rodney (1971): The Sentence in Written English: A Syntactic
Study Based on an Analysis of Scientific Texts.  Cambridge University

Huddleston Rodney, and Geoffrey K.  Pullum (2002): The Cambridge Grammar
of the English Language. Cambridge University Press.

Jackendoff, Ray S. (1990)  On Larson's treatment of the double object
construction.  Linguistic Inquiry 21:427-456.

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik (1985):
A Comprehensive Grammar of the English Language. London: Longman.

                                                         Rodney Huddleston
                                            r.huddleston at mailbox.uq.edu.au
                                                  University of Queensland

                                                        Geoffrey K. Pullum
                                                      pullum at ling.ucsc.edu
                                      University of California, Santa Cruz

