Date:  Sat, 20 Jul 2002 22:32:39 +0200
From:  <j.mukherjee at uni-bonn.de>
Subject:  Disc: A response concerning The Cambridge Grammar of the English Language

A reply to Rodney Huddleston and Geoffrey K. Pullum
concerning The Cambridge Grammar of the English Language

Joybrato Mukherjee, University of Bonn

In their response to my review of The Cambridge Grammar of
the English Language (LINGUIST 13.1932), Rodney Huddleston
and Geoffrey K. Pullum claim that my "critical comments
about CGEL stem from factual errors about the book or other
books." In this context, they point out six issues in
particular ("especially egregious ones", as they put it) to
which I will briefly turn in the following. Given the
strangely offensive tone, it is with some reluctance that I
respond to their comments. Yet, as things stand, a
corrective to the picture of the reviewer (as someone who
lacks even basic reading skills) that has been drawn by the
authors of the Cambridge Grammar in the Book Discussion
Forum is needed. However, the authors should not expect any
further messages from me.

The following abbreviations will be used:
CamGr - The Cambridge Grammar of the English Language
(Huddleston and Pullum 2002)
CGEL - A Comprehensive Grammar of the English Language
(Quirk et al. 1985)
LGSWE - Longman Grammar of Spoken and Written English
(Biber et al. 1999)
REV - Review of CamGr (by J. Mukherjee, LINGUIST 13.1853)
RESP - Authors' response (by R. Huddleston and Pullum,
LINGUIST 13.1932)

(NB: It is unfortunate that Huddleston and Pullum seem to
insist on using CGEL for the Cambridge Grammar, although
this abbreviation has already been widely used for the
Comprehensive Grammar.)

1. Binary branching

As Huddleston and Pullum themselves point out , "there will
be relatively few readers who begin at the beginning and
work their way through the chapters to the end" (CamGr, p.
44). It stands to reason, then, that most readers are
expected to read the first two chapters ("Preliminaries"
and "Syntactic Overview") and proceed to those chapters and
sections that are relevant to their individual needs. From
a complementary perspective, it should be quite clear that
all other chapters and sections will be read against the
background of the basic concepts and general principles
introduced at the beginning.

The two kinds of branching that are introduced and
visualised in the introductory part, are binary branching
and singulary branching (CamGr, p. 26). In RESP,
the authors introduce multiple branching and ternary
branching and point to a specific coordination example
(CamGr, p. 1279) in which multiple branching is visible.
This had not escaped my notice. As a matter of fact, I did
check all tree diagrams (by the way, diagram (24) is not on
p. 1098, as listed on p. xiii of CamGr, but on p. 1089): a
genuine ternary/multiple branching can only be found for
coordination and ditransitive verbs. However, the authors
give the impression in RESP that the CamGr would show no
preference whatsoever for binary/singulary branching. But
it is obvious to any reader that there is a clear focus
throughout CamGr on syntactic analyses on the basis of
binary/singulary branching. And this is a focus that cannot
be found in CGEL. In criticising REV, the authors
repeatedly seem to confuse general core principles and
peripheral special cases. In fact, binary/singulary
branching is presented at the beginning of CamGr as a
general principle from which it would be necessary to
deviate only in order to account for special phenomena.
That coordination is a syntactic phenomenon of a peculiar
type in this regard is mentioned by the authors themselves
(CamGr, p. 66). By the way, the clear focus on - and
preference for - binary/singulary branching is vindicated
by the fact that the conceptual index of CamGr only
includes entries for 'binary branching' and 'singulary
branching', but not for 'multiple branching' or 'ternary
branching' - terms that the authors introduce in RESP but
that obviously play a peripheral role only in CamGr.

Similarly, the authors seem to resent my claim that CamGr -
unlike CGEL - is strongly influenced by generative
concepts, although they themselves speak of "many insights
from generative grammatical research" (RESP) that have been
included in CamGr. Since a review, as I see it, is intended
to provide potential readers with information on what to
expect from the book at hand, it is thus necessary to point
out that CamGr is influenced by generative grammar in
general and favours analyses along the lines of
binary/singulary branching in particular. On the whole,
these are features that are typical of CamGr, although in
comparatively few cases multiple branching may be adopted
and particular generative concepts/analyses may not be
taken over (as mentioned in REV). What is more, these
features clearly distinguish CamGr from CGEL.

2. The subject-predicate division

Huddleston and Pullum are right in pointing out in RESP
that the subject-predicate division, which is at the heart
of CamGr, can also be found in CGEL. However, they ignore
the fundamentally different extents to which this binary
division is capitalised on in the two grammars. Firstly, it
is at the basis of virtually all syntactic analyses in
CamGr, while in CGEL it is drawn on in order to explain
less than a handful of some particular phenomena, e.g.
clause negation (CGEL, p. 1064ff.). In fact, what is much
more important than the predicate in CGEL is the concept of
predication, i.e. the predicate excluding the operator.
Secondly, the predicate is regarded as the head of the
clause in CamGr (p. 24), while this is not the case in
CGEL. Thirdly, the term VP is used for the realisation of
the predicate in its entirety in CamGr: predicate and VP
are co-extensive. In CGEL, on the other hand, the verb
phrase is not co-extensive with the predicate, because VP
is the realisation of a more rigidly defined clause
element, i.e. the verb without any other complements. It
would be another question altogether, by the way, whether
the subject-predicate distinction and the NP-VP distinction
(which are conflated by Huddleston and Pullum in RESP) can
both be traced back to traditional grammar or whether the
traditional subject-predicate distinction was taken up by
generativists in terms of NP-VP. The important point here
is that the subject-predicate division is a cornerstone of
CamGr, while it is not at all central to CGEL.

3. Multiple analyses

Huddleston and Pullum state that CamGr "does allow multiple
analyses where appropriate" (RESP). They point to the
construction "Bob is as generous as Sue" for which two
analyses of the complement of "as" are offered (CamGr, p.
1113ff.). (One piece of quisquilia should be mentioned: the
construction "as...as" is not listed in the lexical index
of CamGr). It has not escaped my notice that there are also
other examples of different analytical approaches to
specific phenomena, e.g. the "dependent-auxiliary analysis"
and the "catenative-auxiliary analysis" of core auxiliaries
(CamGr, p. 1210ff.). In criticising REV, however, the
authors seem to lose sight of the fact that multiple
analyses play a fundamentally different role in CamGr and
CGEL. Notwithstanding the few fields in which CamGr makes
use of multiple analysis, the reader is never given the
impression that alternative/competing/multiple analyses are
a significant principle of CamGr. Again, it is telling that
the conceptual index does neither list "multiple analysis"
(which is not surprising since it is a term peculiar to
CGEL) nor "alternative/competing analysis" (the terms that
Huddleston and Pullum use when discussing the examples
mentioned above). On the other hand, CGEL concludes its
introductory second chapter with two sections on gradience
as "a guiding principle" and multiple analysis as a window
on grammar as an "indeterminate system" (CGEL, p. 90f.).
While multiple analysis is central to CGEL, it is adopted
in CamGr only if the authors see no compelling evidence in
favour of one particular analysis (which is the exception).
Thus, I stand by my line of argumentation in REV that the
extents to which multiple analyses come into play in CamGr
and CGEL are fundamentally different.

An aside: in this context, Huddleston and Pullum pick up on
the example of "She - looked - after her son" vs. "She -
looked after - her son" and state that "Mukherjee gives no
reason for wanting to allow the second as well" (RESP).
There are some good reasons, all of which are hinted at in
CGEL (e.g. p. 1155f.): the prepositional verb which is at
the basis of the second analysis is a semantic unit (one
could hypothesise that it is also an acquisitional unit),
it can usually be replaced by a one-place lexical verb, and
there is a structural analogy that can be drawn between the
prepositional verb and the object on the one hand and a
non-prepositional verb and the object on the other hand.
(Structural analogy, of course, is another key concept
which distinguishes CGEL from CamGr, but this is another
issue.) The point here is not that one particular analysis
is inherently better (needless to say, there is evidence
for and against either analysis); rather, it is one
illustrative example of the fact that CamGr very often
favours one particular analysis while CGEL does not.

4. Corpus use

Huddleston and Pullum's criticism of my remarks on corpus
use in CamGr is unacceptable, because they quote two
sentences in isolation. However, my line of argumentation
is not captured by those two sentences alone.

To begin with, I explicitly listed all kinds of data that
the CamGr is based on: "(1) their (i.e. the authors') own
intuitions as native speakers; (2) other native speakers'
intuitions; (3) computer corpora; (4) other (pre-corpus and
corpus-based) dictionaries and grammars" (REV). (By the
way, I gave the correct page number (p. 11) in REV, but
mistakenly referred to the preface, although the
information is given in Chapter 1.) With regard to (3),
three one-million-word corpora are specified: Brown, LOB
and ACE. If the authors used other corpora directly, they
should have specified them. In RESP, they list other -
certainly valuable - text databases that they had access
to. However, none of them, in my view, is a corpus. As for
the OED, the WWW and the collection of texts the authors
had on computer, Huddleston and Pullum do not attempt to
subsume them under the notion of 'corpus'. As for the 44
million words of the Wall Street Journal (WSJ), they state
that "Mukherjee maintains the peculiar view that WSJ is not
a corpus at all" (RESP). For one, this is not a peculiar
view of mine. The distinction between representative
corpora and linguistically unstructured archives (such as
the WSJ) can also be found, for example, in Leech (1991:
11) and Kennedy (1998: 57). In a wider setting, it seems to
me that 'corpus use' has become a buzzword, but it is often
neglected that there is more to a corpus than the sheer
amount of data it includes. (Of course, it remains a matter
of dispute what exactly representativeness in corpus design
means and, accordingly, what a corpus is. However, the
authors should not dismiss the reviewer's view as exotic
and untenable.) What is more, I did point out in general
terms (contrary to what Huddleston and Pullum claim in
RESP) why representativeness of the database is useful for
a descriptive grammar (otherwise, "general trends in
language cannot be extrapolated", REV). Furthermore,
Huddleston and Pullum think that "Mukherjee may be
confusing the purpose of a descriptive reference grammar
with the aim of statistical studies of frequencies" (RESP).
I am not. The simple fact is that frequency and grammar are
inseparable, because, in a sense, there always is a
frequency-based threshold level: not anything that appears
in performance data can/should be included in a grammar,
and the decision on what to include - picking up on Aarts'
(1991) terminology - usually has to do with 'normalcy' and
'frequency'. Why, for example, do the authors include
specific words in the numerous wordlists they give (e.g. in
the list of mandative verbs, adjectives, and nouns, CamGr,
p. 999)? On a merely intuitive basis? And/or because these
words are attested at least once in their database? And/or
on grounds of frequency of occurrence? If occurrence and/or
frequency in natural discourse are relevant, two questions
arise if the grammatical description is to be testable: (1)
Where do the data come from? (2) Where do the frequencies
(of, say, relevant corpus-based resources) come from?

CamGr provides no answer to either of the questions. The
reader does not know which of the examples are invented,
edited or natural (nor, if they are authentic, where they
come from). The authors seem to think that this is
irrelevant anyway. On the other hand, I would contend that
this kind of information is of great importance from an
empirical point of view and not at all a lightweight
matter. Of course, it is fair enough to draw on corpus-
based insights provided by dictionaries and gramars that
are already available. (In fact, the authors fail to
acknowledge the true nature of my criticism: I did not
accuse them of having ignored corpus data. They simply do
not go into details about the data resources and where they
come into play in CamGr.) However, it is certainly
unfortunate that the reader is never told which of the
wordlists are taken over from, say, specific corpus-based
grammars (and the corpora on which they are based). In this
context, it is for example telling that the reader does not
even know whether it is the first edition of the Collins
COBUILD English Dictionary (Sinclair 1987, cf. CamGr, p.
1765), based on 20 million words, or the second edition
(Sinclair 1995, cf. CamGr, p. 1772), based on 200 million
words, that has been used by the authors in the first

Generally speaking, then, the authors and the reviewer
disagree on two crucial points: the notion of corpus and
the theoretical and methodological implications of the use
of corpus data. In a sense, the different opinions
culminate in the authors' description of LGSWE as a
"corpus-restricted study" (RESP), whereas I prefer to
regard it as a corpus-based grammar. Which brings me to the
issue of extraposition/non-extraposition.

5. Extraposition

Huddleston and Pullum think that the reviewer is unable to
acknowledge the distinction between canonical and non-
canonical structures in CamGr ("he apparently does not see
its relevance here", RESP). This is not the case. In fact,
I did not call into question that "non-extraposition is
analytically more basic: it is syntactically simpler, and
has a structure that is normally the only one available for
NP subjects" (RESP). From a syntactic point of view, there
is unanimous agreement on this description. And, indeed,
one would not need corpus data for this conclusion.

The point is that whatever counts as basic is a matter of
linguistic interpretation and, more important, of the
criteria that are taken into consideration: basicness is
not out there. In REV, I took the liberty of simply
pointing to an alternative approach which could have been
mentioned in CamGr. Frequency, for example, is a criterion
that the authors do not take into account. Note that I am
not talking about frequency for its own sake but as a
quantitative signpost of something that is qualitative in
nature (about which more later). From this quantitative-
qualitative perspective, it would indeed be "better to
regard the extraposed form as the more basic form" (REV).
In REV, there is a reference to the corpus-based findings
that can be found in LGSWE.

Before coming to the LGSWE findings and their implications,
it should be noted that Huddleston and Pullum think that
there is no difference between CamGr and LGSWE in
considering non-extraposition as the basic form. At first
sight, this seems to be true. Interestingly enough, they
only refer to sections 3.5 and 3.6 of LGSWE - sections in
which frequency and distribution play a peripheral role:
"This characteristic of the grammar (i.e. quantitative,
empirical investigations) is less striking in Section B
(Chapters 2 and 3), since the primary purpose of those
chapters is to provide a descriptive framework of English
word classes and grammatical structures" (LGSWE, p. 44).
Corpus data and, more important, discussions of corpus-
based findings and their implications for grammatical
description, are at the heart of the subsequent chapters.
As for extraposition of to-clauses (the example mentioned
in REV), sections 9.4.6 and 9.4.7 are of particular
interest (LGSWE, pp. 722 ff.). The starting point here is
that, firstly, non-extraposed to-clauses are less frequent
than extraposed to-clauses in general and that, secondly,
there is a difference between spoken and written genres.
LGSWE gives several reasons for these findings, e.g.
reasons of processability, different production constraints
in spoken and written medium, and marked topicalisation by
means of non-extraposition. In the light of the
quantitative findings, LGSWE explicitly speaks of
extraposition as "the unmarked choice" (LGSWE, p. 725), and
this conclusion can be explained by factors such as the
ones mentioned above. In this case, frequency is thus
symptomatic of important discourse and processing factors.
And from this perspective, one could easily argue that non-
extraposition is 'more basic'. Whether or not one prefers
the syntactic approach outlined in CamGr or the frequency-
based approach sketched out in LGSWE, neither analysis
invalidates the other one. There is, however, no use in
ignoring the differences between CamGr and LGSWE when
comparing the two grammars in their entirety.

6. Number of figures, tables, diagrams etc.

Huddleston and Pullum take issue with my complaint
about the "lack of graphical visualisation" (REV), i.e. the
number of tables, figures and diagrams.

However, in order to prove me wrong, they simply count
trees in  chapter 15 of CamGr and in chapter 13 of CGEL.
The result is not at all surprising. They confine
themselves to trees because "it is unclear where to draw
the line between tables and mere columned displays" (RESP).
This is certainly true. But may I add that this very line
could be much more easily drawn (say, for counting
purposes) if the tables, diagrams and figures had been
numbered separately in CamGr (as it is done, with some
inconsistencies, in CGEL). Be this as it may, the fact is
that my criticism, put forward in REV, was not about the
number of tree diagrams in CamGr. Also, the authors give no
reason why "they believe these chapters are representative"

In preparing REV, I conducted a more refined counting of
tables/table-like displays and diagrams/figures (incl. tree
diagrams) in chapter 3 of CamGr ("The verb", 141 pages) and
in chapters 3/4 of CGEL ("Verbs and auxiliaries"/"The
semantics of the verb phrase", 154 pages). They are of
comparable size, and I still think that the description of
the verb is much more central to grammar than a more
specialised chapter (say, on coordination) and thus more
'representative' of the grammars at hand.

In CGEL, there are 29 tables (3.2, 3.5a, 3.5b, 3.5c, 3.12,
3.13, 3.14, 3.15, 3.16, 3.17, 3.18, 3.19, 3.20, 3.32, 3.33,
3.36, 3.39, 3.40a, 3.40b, 3.42, 3.52, 3.56a, 3.56b, 3.64,
4.17, 4.28, 4.30, 4.33, 4.66) and 15 figures (3.21, 3.55,
3.65, 4.2a, 4.2b, 4.7, 4.14, 4.18, 4.19, 4.20, 4.24a,
4.24b, 4.24c, 4.27, 4.51).

In CamGr, I took into account the following 14 tables (as
stated above, the numbering is not reader-friendly): in
section 3.1 no. 1, 2, 3, fn 1, 35; in section 3.2 no. 17,
43, 48, 49; in section 3.3 no. 1; in section 3.4 no. 2, 6;
in section 3.7 no. 2, 3. Additionally, there are 2 figures:
in section 3.3 no. 6; in section 3.5 no. 4. (There are some
other borderline cases, but I would contend that as soon as
one starts to discuss whether it is a table/figure or not,
it is certainly not as clear-cut a visual aid as all the
above-mentioned tables and figures in CGEL and CamGr which
everyone would readily regard as tables and figures).

The quantitative differences between CamGr and CGEL in this
sample analysis are statistically significant at a one
percent level.

Final remarks

It is a pity that the authors' response to REV and my
response to RESP have entirely focused on those aspects of
CamGr that the reviewer does not find convincing. It is,
thus, more than appropriate to summarise the positive
aspects that were mentioned in REV:

- comprehensiveness: breadth and depth of coverage
- systematisation of previous linguistic research
- a reference grammar with many new/promising concepts
- many examples of innovative terminology
- many wordlists
- in-depth treatment of morphology and word-formation
- well-structured

What is perhaps most important in the light of REV, RESP
and the present response is the fact that CamGr is an
unprecedented reference work that is different from all
other standard grammars of the English language. Therefore,
it is beyond reasonable doubt that many linguists will
agree with the authors that it "bridge(s) the large gap
between traditional grammar and the partial descriptions of
English grammar proposed by those working in the field of
linguistics" (CamGr, p. xv). Let me thus emphasise once
again that there are very good reasons why CamGr "is
without any doubt a reference work that should be available
to all grammarians" (REV). For some, it will certainly turn
out to be the preferred choice, for others it will not. And
many will use CamGr and other reference grammars side by
side - in general or for particular purposes.

I received quite a few replies to REV from people who have
already worked with CamGr. The versatility of the feedback
- ranging from "a great contribution" to "false claims" -
makes it clear that in grammar, too, beauty is in the eye
of the beholder.


Aarts, Jan (1991): "Intuition-based and observation-based
grammars", Svartvik, ed. Karin Aijmer and Bengt Altenberg.
London: Longman. 44-62.

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan
Conrad and Edward Finegan (1999): Longman Grammar of Spoken
and Written English. Harlow: Pearson Education. (LGSWE)

Huddleston, Rodney and Geoffrey K. Pullum (2002): The
Cambridge Grammar of the English Language. Cambridge:
Cambridge University Press. (CamBr)

Kennedy, Graeme (1998): An Introduction to Corpus
Linguistics. London: Longman.

Leech, Geoffrey (1991): "The state of the art in corpus
linguistics", English Corpus Linguistics: Studies in Honour
of Jan Svartvik, ed. Karin Aijmer and Bengt Altenberg.
London: Longman. 8-29.

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan
Svartvik (1985: A Comprehensive Grammar of the English
Language. London: Longman. (CGEL)

Sinclair, John (ed.) (1987): Collins COBUILD English
Language Dictionary. London: Collins.

Sinclair, John (ed.) (1995): Collins COBUILD English
Dictionary. London: HarperCollins.

