12.1218, Review: Mair & Hundt, Corpus Linguistics

Wed May 2 21:54:12 UTC 2001

LINGUIST List:  Vol-12-1218. Wed May 2 2001. ISSN: 1068-4875.

Subject: 12.1218, Review: Mair & Hundt, Corpus Linguistics

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Lydia Grebenyova, EMU		Jody Huellmantel, WSU
	James Yuells, WSU		Michael Appleby, EMU
	Marie Klopfenstein, WSU		Ljuba Veselinova, Stockholm U.
		Heather Taylor-Loring, EMU		

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Terence Langendoen <terry at linguistlist.org>
 ==========================================================================
What follows is another discussion note contributed to our Book Discussion
Forum.  We expect these discussions to be informal and interactive; and
the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for discussion."  (This means that
the publisher has sent us a review copy.)  Then contact Simin Karimi at
     simin at linguistlist.org or Terry Langendoen at terry at linguistlist.org.

=================================Directory=================================

1)
Date:  Wed, 2 May 2001 23:25:02 +0200
From:  j.mukherjee at uni-bonn.de
Subject:  Review Mair/Hundt, Corpus Linguistics

-------------------------------- Message 1 -------------------------------

Date:  Wed, 2 May 2001 23:25:02 +0200
From:  j.mukherjee at uni-bonn.de
Subject:  Review Mair/Hundt, Corpus Linguistics

Christian Mair and Marianne Hundt, eds. (2000)  Corpus
Linguistics and Linguistic Theory (Language and Computers:
Studies in Practical Linguistics No 33). Amsterdam/Atlanta:
Rodopi.

Reviewed by Joybrato Mukherjee, University of Bonn

This volume (announced on LINGUIST 12.272) comprises a
selection of papers from the Twentieth International
Conference on English Research on Computerized Corpora,
which is usually referred to as ICAME 20 (International
Computer Archive of Modern and Medieval English). The
conference was held in Freiburg/Germany in May 1999. As
pointed out by the convenors and editors of this book, the
conference motto - taken up in the book title - contributes
to the fact that it is "timely to focus on a discussion of
the changing relationship between current practice in
(ICAME-type) corpus linguistics and issues of linguistic
theory exercising the field as a whole" (p. 1). Despite the
widening theoretical scope of corpus linguistics which is
put into perspective in this volume, all papers also
represent the traditional "ICAME type" of corpus-based
research in that they shed new light on specific aspects of
authentic language use by way of extensive and empirical
analyses of large corpora. In combining in-depth analyses
of corpus data with discussions of their relevance to
linguistic theory, this book no doubt makes for highly
stimulating reading. In my view, it is a pity that the
editors "have abstained from pigeonholing the contributions
in any way and resorted to the more neutral alphabetical
ordering" (p. 3). As things stand, I will, however, keep to
the alphabetical ordering of papers in the following
synopsis.

Synopsis

In the first paper, Bas Aarts makes a plea for more
"qualitative" research in corpus linguistics. He points to
the fact that mere statistics (or "number crunching")
cannot be an end in itself, but that frequencies in corpora
should always serve as a starting point for a truly
functional approach accounting for quantitative data. New
software programs such as the International Corpus of
English Corpus Utility Program (ICECUP) allow such
qualitative research to be conducted even in the field of
syntax since syntactically parsed corpora are now
available, e.g. the British component of the International
Corpus of English (ICE-GB). Apart from those important
theoretical considerations (which are exemplified by
investigating the distribution of transitivity patterns in
ICE-GB), this paper is also worth-reading for another
reason. Aarts starts off by giving an extract from an
interview he conducted with Noam Chomsky at MIT. When
reading the first exchange, any corpus linguist will most
certainly give a deep laugh and shake their heads in
disbelief: "Bas Aarts: What is your view of modern corpus
linguistics? Noam Chomsky: It doesn't exist." (p. 5)

Bengt Altenberg and Karin Aijmer discuss the state of the
art in cross-linguistic research which draws on parallel
corpora. By diligently reviewing previous studies and
systematising different kinds of parallel corpora (i.e.
comparable corpora vs. translation corpora), they show how
the English-Swedish Parallel Corpus (ESPC) can be analysed
from different perspectives. The analysis of parallel
corpora provides some important insights into discrepancies
between different language systems. For example, agentless
English passives tend to be translated not only into
corresponding Swedish passive constructions, but also into
active sentences with the generic pronoun "man" or a
personal pronoun functioning as subject. Such system gaps
call for a careful definition of an appropriate "tertium
comparationis" as the point of reference in cross-
linguistic research.

The identification of sociolinguistic factors conditioning
the preferred selection of either "gonna" or "going to" in
the spoken component of the British National Corpus (BNC)
lies at the heart of Ylva Berglund's paper. The relevant
factors include, for instance, speakers' age, education and
social class. However, the quantitative data do not reveal
a clear-cut correlation between speakers' sex and the
preference of the standard variant over the reduced one. In
light of the widely held view that women are wont to use
standard variants more than men, this empirical analysis
nicely shows that authentic language data may well
contradict intuition-based assumptions.

Another well-established commonplace in linguistic research
is challenged by Sylvie de Cock. Her paper is devoted to
the use of highly recurrent word combinations (HRWCs) by
advanced learners of English having French as their mother
tongue. The database is provided by the relevant parts of
several learner corpora at the University of Louvain.
Comparable native-speaker databases serve as control
corpora. The data reveal that the foreign-soundingness of
learners' English is not only due to an overuse of cliche-
like prefabricated sequences of words, but that the picture
is rather more complex and includes at least four different
aspects of non-nativelike language use: misuse (deviant use
of English phrases which formally resemble but which are
not semantically identical with French phrases, e.g. "on
the contrary"); overuse (e.g. "and so on"); underuse (e.g.
"sort of"); use of learner idiosyncratic combinations (e.g.
*"according to me"). Furthermore, it is necessary to
distinguish between spoken and written language since the
two media differ with regard to the frequencies and
distribution of those four aspects of non-nativelike use of
HRWCs.

Learner English is also the object of inquiry in Pieter de
Haan's article, though from a more technical perspective.
He deals with principles and problems of automatically
tagging non-native English. The focus is on how to come to
grips with learner errors in the word-tagging procedure. To
this end, the author introduces a taxonomy table for
learner errors. On this basis, he describes possible
solutions as to how the Tag Selection Tool may cope with
different kinds of learner errors.

Inge de Moennink discusses the even more complex issue of
syntactically parsing learner English. Although computer-
aided error analysis is, in principle, feasible and
applicable to corpus annotation, a full-fledged
automatisation of the process seems to be impossible at
present, rendering the parsing procedure extremely time-
consuming. However, the semi-automatic solutions offered by
the author represent very useful suggestions since the
over-all goal of parsing a learner corpus should not remain
wishful thinking in the long run: for one, this would allow
for an empirical analysis of syntactic differences between
learners' and native speakers' language use. Secondly, it
would facilitate the development of an automatic error-
tagging system.

In a similar vein to Bas Aarts' paper, Juergen Esser's
discussion of "corpus linguistics and the linguistic sign"
is a truly programmatic celebration of the conference
motto. The close inspection of large amounts of corpus data
calls for a refinement (or even a revision) of the
Saussurean sign model. Firstly, its restriction to the
acoustic image needs to be overcome. The signifier should
be extended by considering the medium-bound realisations
(orthographic and phonological) of, say, a word.
Furthermore, the medium-dependent word-form realisations
should be integrated into the possible medium-independent
grammatical word-forms with specific ranges of meaning. For
example, "tree" in singular form is attested with the
meanings "plant" and "drawing" in the BNC, while the plural
form "trees" is exclusively associated with the first
meaning (i.e. "plant") alone. Secondly and additionally,
these data warrant a differentiation of the meaning-side of
the linguistic sign according to such sense-restrictions on
specified word-forms. Accordingly, Esser introduces the
notion of a lexical linguistic sign as "the union of a
single sense and a set of medium-independent, abstract
grammatical word-forms" (p. 97). Such form-meaning-
associations within the Saussurean lexical sign can be
identified with the help of corpora.

Maria Estling's study illuminates so-called competing
constructions by investigating the frequency and
distribution of grammatical synonyms including the
quantifiers "all", "both" and "half". Drawing on relevant
parts of the CobuildDirect Corpus and newspaper corpora,
special emphasis is put on the comparison of British,
American and Australian usage. For example, while in
American English there is no clear preference for either of
the competing constructions "half a + modifier + noun" and
"a half + modifier + noun", in British and Australian usage
the former structure clearly outnumbers the latter. The
author also draws some important general conclusions from
the data. In particular, corpus analyses can help identify
the most frequent variants which should be taught first to
learners of English. Not the least likely to profit from
such data are advanced learners seeking detailed
information on when to use which competing construction.

Grammatical alternatives also play a role in Roberta
Facchinetti's study. She explores the use of "be able to",
which is suppletive to the modal "can" in non-finite
contexts, in present-day English. By comparing two written
standard corpora from the 1960s and the 1990s (and by
taking into account a written sample corpus from the BNC),
the author rejects the hypothesis that the use of "be able
to" is on the increase. Her careful qualitative analysis
reveals that "be able to" is used for specific semantic
reasons, even if "can" or "could" are possible: for
example, "be able to" may refer to the actuality of an
event or the fact that the subject successfully manages to
carry out the action. It would most certainly be
interesting to look at spoken material in future research.

The Chemnitz InterNet Grammar (CING) is a contrastive and
interactive learning environment available on the internet
and designed for German learners of English. Angela Hahn,
Sabine Reich and Josef Schmied sketch the descriptive
potential of this on-line tool by focusing on how to teach
the present progressive. CING includes an English-German
translation corpus which provides a wide range of examples
of translating the English progressive aspect into German
(which has no immediate aspectual equivalent). Thus,
learners have access to a well-chosen selection of
translations illuminating the use of the progressive. The
authors also suggest a theoretical model of the present
progressive which basically includes two parts: "the
reference time is included in event time" and "speech time
= reference time" (p. 138).

Janet Holmes proves that corpus-based methods are relevant
to sociolinguistics in general and gender studies in
particular. Lakoff's (somewhat impressionistic) assumption
that "lady" is gradually replacing "woman" is clearly and
empirically rejected by Holmes who investigates standard
British corpora from the 1960s and 1990s and New Zealand
corpora from the 1980s. On the contrary, her in-depth
semantic analysis of the data reveals that "woman" is now
the unmarked term for referring to adult females (and
taking them seriously) whereas "lady", once marked as
polite and respectful, is increasingly associated with a
negative semantic prosody which can be described as
conservative, patronising, dated and trivialising. Thus, it
does not come as too much of a surprise that "lady" is
decreasing in terms of frequency, which, by the way, also
holds true for "gentleman". Once again, careful observation
of authentic language data calls into question long-
established intuition-based hypotheses.

Gunther Kaltenboeck's paper breaks new ground in a corpus-
based analysis of information structure. The object of
inquiry is the alternation between it-extraposition and
non-extraposition which has often been said to be linked to
different distributions of weight and information.
Analysing ICE-GB exhaustively, Kaltenboeck explores this
issue empirically. To begin with, it-extraposition turns
out to be the statistically unmarked form, accounting for
almost 90% of all instances. By considering the context of
all 1,918 examples at hand, the author gives a detailed and
considered account of a multitude of syntactic, semantic,
stylistic, pragmatic and information-structure factors that
lead the language user to prefer one of the two
arrangements. To pick out but one factor, non-extraposition
is much more common in writing than in speaking. This,
however, does not pertain to non-extraposed wh-clauses
which are evenly distributed across the two media.
Generally speaking, the scrutiny of (non-)extraposition in
authentic contexts makes it clear that the two
constructional types "do not show a one-to-one
correspondence which would allow easy 'swapping'" (p. 158).

Another innovative paper is provided by Thomas Kohnen who
applies corpus-based methods to the analysis of speech
acts. Since it is difficult (if not to say impossible) to
operationalise the pragmatic notion of speech act in terms
of linguistic form, he confines himself to performatives
which tend to be realised in a restricted range of formal
structures. The author not only describes the distribution
of performatives across different genres in present-day
English corpora, but also opens up a diachronic perspective
by looking at the Old English section of the Helsinki
Corpus. The tentative results he obtains from the
diachronic point of view lead him to point out important
questions which await further research, e.g. the issue of
the historical development of speech act conventions of
politeness and formality.

Uta Lenk's paper is devoted to a classic research topic in
corpus linguistics which continues to merit attention:
collocational frameworks. The author is particularly
interested in so-called "stabilized expressions" including
the lexeme "time" (e.g. "all + determiner + time") and
their semantic potential. To this end, she investigates
several spoken corpora, including those of British,
American and New Zealand English. Her analysis casts new
light on the specific meanings with which seemingly banal
patterns and their variations are associated. For example,
the stabilized expression "all this time" is used to refer
in a neutral way to a relatively long and continuing period
of time. Conversely, "all that time" tends to include "an
expression of dismay at the extension of the duration of
the period mentioned" (p. 189). Lenk's paper provides ample
testimony of the fact that such semantic subtleties of
collocational patterns should receive much more attention
in foreign language teaching if learners are to acquire as
much nativelike communicative competence as possible.

Corpus analysis has no doubt become a standard
("mainstream") methodology in linguistics. Despite (or
because of?) this development, there is an increasing
awareness that corpus-based methods should be reliable and
empirically sound. In this context, Hans Lindquist and
Magnus Levin discuss the issue of comparing data from
different corpora. Such comparisons are often inevitable,
but nonetheless problematical since different corpora tend
to be compiled according to different standards, to differ
in size, genres and other regards. Thus, results obtained
from a comparison of different corpora should be taken with
a pinch of salt as the authors impressively reveal by means
of many concrete examples. Furthermore, very large corpora
may hide genre-specific facts since there is reason to
believe that frequencies in language use are mainly bound
to particular genres rather than to the language as a
whole. In the last resort, the linguist cannot dispense
with a careful consideration of the representativeness and
comparability of the corpus material on which he or she
draws.

Corpus-based methods are increasingly applied to diachronic
studies. Accordingly, Manfred Markus looks at the use of
causal connectors in Middle English as opposed to present-
day English. From the wealth of interesting data, he draws
three important general conclusions: (1) whereas in modern
English speakers prefer causal conjunctions (especially
"for" and "because"), adverbs (e.g. "therefore" and "thus")
prevail in Middle English texts; (2) "because" in
particular has changed from a conjunction "of the imprecise
kind" (p. 227), i.e. referring to cause or result, to a
genuinely causal connector in present-day English; (3) co-
occurrences of causal adverbs and conjunctions are typical
of Middle English as, for example, in "right so" and "all
thus".

The present lack of software standardisation is discussed
by Oliver Mason. Corpus-linguistic research turns out to be
affected by what the author calls a "programming dilemma":
for example, software developers are not (and cannot) be
aware of future research questions so that the software at
times proves to be less than optimal for the issue at hand.
On the other hand, corpus linguists who want to develop
their own tailor-made software program have to start from
scratch and face a very time-consuming process. In seeking
to provide a way out of this dilemma, the author describes
The Corpus Universal Examiner (CUE) System, a modular
software program, and Qwick, a (simple but robust) corpus
browser making use of CUE. They are available free of
charge. What is more, the modularisation of the software
allows for its application in many research projects since
it is possible to adopt suitable modules and complement
them with software modules developed individually.

Anneli Meurman-Solin attempts to re-categorise multi-word
verbs on the basis of different strengths of cohesive ties
that hold between verb and preposition. This also leads to
a re-evaluation of the distinction between non-idiomatic
free combinations of verb and preposition and idiomatic
multi-word verbs. Special attention is paid to the use of
"put" in complex-transitive complementation. The author
argues that while the description of the clause pattern in
"He put the evidence before the jury" as SVOA is, in fact,
plausible, the idiomatic use of "put before" in "He put
work before family" should be subsumed into the
ditransitive complementation type: "work" and "family"
function as two objects required by the idiomatic multi-
word verb "put before" which has a distinct figurative
meaning. Many other examples which support the author's
view are obtained from the BNC. This paper nicely
exemplifies the way in which language data themselves may
lead to a reassessment of existing grammatical models.

Intonationists also benefit from corpus-linguistic methods.
Ilka Mindt describes significantly frequent prosodic cues
at speaker turns as obtained from the analysis of parts of
the Lancaster/IBM Spoken English Corpus (SEC). In
principle, there are two prosodic patterns which are
formally different (considering F0-levels before and after
the turn) and which fulfil different textual functions: (1)
the "discontinuity pattern" consists of an extremely low
endpoint before the turn and a very high starting point
after the turn, signalling the discontinuity of a specific
topic; (2) in the "continuity pattern", F0-levels before
and after the turn are much closer together, indicating the
continuation of the topic at issue.

Tore Nilsson presents a crisp and interesting analysis of
noun phrases (NPs) in British travel texts. His 100,000-
word corpus covers three categories: British tourist
brochures, articles from the Sunday Times Travel Supplement
and British travel guides. The results show, for example,
that the newspaper articles display the simplest NP
structures, whereas travel guides in particular are
characterised by heavy NP postmodification. The author
suggests some general explanations for those findings,
mainly centering around the different communicative
functions fulfilled by the text types.

Linguists at the University of Nijmegen have undoubtedly
been in the vanguard of the development of a corpus-based
approach to the systematic description of language use.
Nelleke Oostdijk gives a progress report on the TOSCA
(Tools for Syntactic Corpus Analysis) descriptive model.
Especially with regard to the syntactic analysis of
authentic spoken data, corpus linguistics has brought to
light the need for a restructuring of existing descriptive
grammars such as the Quirk grammars. The author, for
example, points out how the TOSCA descriptive model deals
with hesitation signals (e.g. "er") and discourse markers
(e.g. "I mean") which elude traditional grammatical
frameworks based on hierarchical relations of immediate
constituency. The on-going development of the TOSCA
descriptive model is an ambitious and impressive project in
that it aims to accommodate the grammatical model to real
language use. In so doing, the project clearly exposes the
fallacy of considering syntax an entirely autonomous level
of description.

Minna Palander-Collin focuses on the use of the evidential
or epistemic expression "I think" in the language of
husbands and wives in seventeenth-century letters. Two main
conclusions are drawn from the quantitative and qualitative
analysis of parts of the Corpus of Early English
Correspondence: (1) in general, wives use "I think" much
more often than husbands; (2) in particular, wives turn out
to use "I think" predominantly for interpersonal purposes
(e.g. in order to be conventionally indirect or to
apologise). This paper highlights the importance of corpus
analyses for empirically sound gender studies even in the
field of historical linguistics.

Pam Peters investigates the use of synthetic and analytic
comparatives and superlatives with 60 common disyllabic
adjectives in the BNC. Almost all adjectives are attested
in the two possible comparative and superlative forms
respectively. However, some general, though at times
surprising and contradictory trends can be detected: (1)
disyllabic adjectives ending in "-y" tend to occur in the
synthetic pattern (e.g. "easy/easier/easiest", but not
"worthy" whose comparative and superlative forms usually
are "more worthy" and "most worthy"); (2) quite a few
disyllabic adjectives (e.g. "deadly") are shown to have a
"crossover" pattern in that they habitually form analytic
comparatives (e.g. "more deadly") but synthetic
superlatives (e.g. "deadliest"). To a certain extent, those
crossover patterns can be explained by collocational
factors since some adjectives are often used in routinised
phrases with the synthetic superlative (e.g. "deadliest
weapon"). In conclusion, the author correctly suggests that
the adjective paradigm seems to be "splintered rather than
simply split" (p. 311).

The formal realisations and the functions of the present
perfect are explored by Norbert Schlueter. The empirical
and semantic analysis of spoken and written as well as
British and American corpus material leads the author to
identify two distinct functions of the present perfect:
either it refers to an "indefinite past" or to a
"continuative past" (both functions are sub-categorised
further). It is shown that specific forms of the present
perfect (e.g. active progressive) are strongly linked to
specific functions (e.g. continuative past). As to the
range of functions the present perfect fulfils, it is
particularly interesting to see that the least common
function (i.e. continuative past) tends to be marked by a
temporal marker in two-thirds of all instances: quite
obviously, there seems to exist a correlation between low
frequency of function and high rate of linguistic
specification.

The design and compilation of the Rostock Historical
Newspaper Corpus is described in detail by Kristina
Schneider. The 600,000-word corpus comprises British
newspapers from 1700 to the present at 30-year intervals
which were selected according to external criteria
(circulation, price, frequency and time of publication) and
internal criteria (news content, non-news content, layout).
Thus, a large and powerful diachronic database is now
available to linguists interested in historical
developments in newspaper language and/or stylistic
differences between down-market, mid-market and up-market
newspapers across time.

Although the ICE project comprises regional subcorpora of
only one million words each, their contrastive analysis
allows for dialectological research into idioms, which is
the topic of Paul Skandera's paper. He explores
peculiarities in the use of idiomatic word combinations in
Kenyan English against the background of British English
usage (as laid down in ICE-GB). Kenyan English is shown to
make use of idioms not or rarely attested in ICE-GB (e.g.
"jerrican") and variant formal realisations of British
English idioms (e.g. "quite fine"). Furthermore, idioms are
used with different meanings (e.g. "whereby") and there are
local coinages as well as loan translations/borrowings from
indigenous languages (e.g. "jua kali"). It is to be hoped
that this paper will stimulate corpus linguists to pursue
idiomatic research on the basis of ICE data.

The issue of English-Swedish translations is discussed by
Mikael Svensson. Swedish translators of English texts are
faced with the serious problem that while English allows
several elements before the finite verb, Swedish allows
only one. Taking into account the importance of the
sentence-initial, thematic position for textual
progression, the author seeks to offer a set of principles
according to which translators may choose one specific
preverbal element. For example, if the English sentence has
an initial element which fulfils a discourse-organising
function (be it the subject or not), it should remain in
initial position. If necessary, the subject should be moved
into postverbal position, and heavy constituents should be
placed in sentence-final position. Svensson's suggestions
illustrate the immediate relevance of analyses of parallel
corpora (e.g. the ESPC) to translation studies.

Bernadette Vine focuses on the methodological challenges
which she encountered in her functional approach to
directives in spoken corpora. At the outset, the
identification of functional entities, such as directives,
which have a virtually unlimited number of formal
realisations poses a serious problem for the formalisation
and automation of the search query. What remains is either
a manual or a selective procedure. This article reminds the
reader of Thomas Kohnen's comments on the limitations of
corpus-based methods in pragmatic research (see above).
Nevertheless, the author ends with an encouraging note of
optimism: "Getting things done in an analysis of how people
get things done is complicated and time-consuming, but also
very interesting and rewarding." (p. 374)

In late sixteenth century, language users had to choose
between two possible second person singular pronouns: "you"
or "thou". In analysing material from the Corpus of English
Dialogues, Terry Walker compares the use of those pronouns
(and their variants) in English Drama (i.e. constructed
speech) and authentic speech from witness depositions. In
all texts, "you" turns out to be the unmarked and neutral
form. "Thou", on the other hand, represents the marked form
for specific purposes (e.g. to express affection or
intimacy) and, in quantitative terms, is shown to have
already gone into a decline. Furthermore, men use "thou"
more often than women.

In the final paper, Keith Williamson describes the lexico-
grammatical tagging system that has been used in the
historical linguistic atlas projects at the University of
Edinburgh, covering Early Modern English and Older Scots.
One of the many problems is caused by the enormous amount
of orthographic (and phonological) variants. It seems to be
necessary to consider etymological information so that the
tagging procedure can be based on a pre-defined set of
linguistic forms which have derived from a specific etymon.
The issue of automatic parsing is even more complex, but
should be pursued in future research since it would allow
for the syntactic analysis of texts from a period of time
in which language was in a mesmerising state of flux. On
the whole, the author points out some important aspects of
adjusting synchronic corpus technology to diachronic needs.

Critical evaluation

Christian Mair and Marianne Hundt have edited an excellent
selection of papers. All articles are of good quality,
concerning both content and style, and the proof-reading
turns out to have been almost perfect. Only very few errata
remain (e.g. *"Englis" on p. 320, *"decsribed" on p. 385).
The volume covers a wide range of linguistic fields to
which corpus-based methodology proves relevant. Living up
to the book title, many empirical analyses of specific
linguistic phenomena are complemented with thought-
provoking discussions of either the implications and
applications of the results in a wider setting or of
theoretical and methodological principles and problems.
Thus, it is to be hoped that not only will corpus
professionals closely peruse "Corpus Linguistics and
Linguistic Theory", but also that colleagues who are
still sceptical about corpus linguistics will be tempted to
get involved with corpus-based methods. Let me emphasise,
though, that corpus linguists should not pay too much
attention to the kind of criticism that has been put
forward by generativists in particular for forty years now.
Consider the way Noam Chomsky, in a response to Bas Aarts,
rebuffs the corpus-linguistic enterprise in its entirety:
"You don't take a corpus, you ask questions. You do exactly
what they do in the natural sciences. (...) You have to ask
probing questions of nature. That's what is called
experimentation, and then you may get some answers that
mean something. (...) You can take as many texts as you
like, you can take tape recordings, but you'll never get
the answer." (p. 6) That corpus linguists are, from the
outset, unable to provide for scientific answers to
linguistic questions is, to say the least, utterly
ridiculous. The conference proceedings under discussion
give thirty impressive examples of the amazing extent to
which careful analyses of authentic language in real
contexts result in important answers to central (and
peripheral) linguistic questions - answers which are
difficult (if not impossible) to obtain otherwise, answers
which - in the editors' words - represent "detailed and
testable accounts of language use in all its baffling
complexity rather than a postulated underlying language
system" (p. 3).

Biographical note

Joybrato Mukherjee is an Assistant Professor of Modern
English Linguistics at the English Department of the
University of Bonn. His research interests include corpus
linguistics, stylistics, textlinguistics, syntax,
intonation and EFL teaching. He is currently working on a
corpus-based analysis of ditransitive verbs and their
complementation patterns.

---------------------------------------------------------------------------

If you buy this book please tell the publisher or author
that you saw it reviewed on the LINGUIST list.

---------------------------------------------------------------------------
LINGUIST List: Vol-12-1218