17.1312, Review: Corpus Ling: Sampson & McCarthy (2006)

Fri Apr 28 20:04:45 UTC 2006

LINGUIST List: Vol-17-1312. Fri Apr 28 2006. ISSN: 1068 - 4875.

Subject: 17.1312, Review: Corpus Ling: Sampson & McCarthy (2006)

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Lindsay Butler <lindsay at linguistlist.org>

This LINGUIST List issue is a review of a book published by one of our
supporting publishers, commissioned by our book review editorial staff. We
welcome discussion of this book review on the list, and particularly invite
the author(s) or editor(s) of this book to join in. To start a discussion of
this book, you can use the Discussion form on the LINGUIST List website. For
the subject of the discussion, specify "Book Review" and the issue number of
this review. If you are interested in reviewing a book for LINGUIST, look for
the most recent posting with the subject "Reviews: AVAILABLE FOR REVIEW", and
follow the instructions at the top of the message. You can also contact the
book review staff directly.


Date: 27-Apr-2006
From: Jonathan Clenton < jclenton at lang.osaka-u.ac.jp >
Subject: Corpus Linguistics: Readings in a Widening Discipline 

-------------------------Message 1 ---------------------------------- 
Date: Fri, 28 Apr 2006 16:01:28
From: Jonathan Clenton < jclenton at lang.osaka-u.ac.jp >
Subject: Corpus Linguistics: Readings in a Widening Discipline 

Announced at http://linguistlist.org/issues/17/17-1006.html 

EDITORS: Sampson, Geoffrey; McCarthy, Diana 
TITLE: Corpus Linguistics 
SUBTITLE: Readings in a Widening Discipline 
PUBLISHER: Continuum International Publishing Group Ltd 
YEAR: 2006 

Jonathan Clenton, Graduate School of Language and Culture, Osaka 
University, Japan 


This book starts by introducing just how far corpus linguistics has 
come since 'electronic corpus linguistics' became commonplace 
or 'B.C.' (before computers). Sampson and McCarthy point out how as 
early as the eighteenth century ''Dr Johnson based his famous English 
dictionary in part on a collection of over 150,000 quotations'', which 
they say ''was certainly a corpus of a sort.'' The editors have made 
significant contributions to the area of corpus linguistics and here they 
provide a coherent presentation of the approach enunciated in 
various arenas over the years. The result is a collection of articles 
intended as a basic source book of 'background knowledge' for 
students working in the field of corpus linguistics. 

The 42 papers in this volume will be familiar to most who teach in the 
area. They include work by Fries, Francis, Aarts, Altenburg, Hanks, 
Biber & Finegan, Sinclair, Collins, Church, Brown, Ihalainen, Hellberg, 
Rissanen, Burnage & Dunlop, Leech & Fallon, Frances, Hindle & 
Rooth, Louw, Marcus, Kita, Briscoe & Carroll, Tent & Mugler, 
Charniak, Mindt, Bod & Scha, Hasund & Stenström, Carletta, Werry, 
Resnik & Yarowsky, Hyland & Milton, Core, McEnery, McKelvie, Pols, 
Bohmova & Hajicova, Sampson, Campione & Veronis, Kilgarriff, and 
Grabe & Post. In addition to the usual author and subject matter 
indices, there is a substantial glossary that students will find 
invaluable. The book is organised chronologically in terms of the 
earliest date each item was presented to the public, such as an oral 
presentation, or web page. Each paper begins with a short 
introduction by the editors, and is completed by a set of notes. These 
last consist of comments in order to update or clarify the texts in the 
section, with occasional invaluable references to web locations. The 
book ends with a combined bibliography comprising all the work cited 
by the articles in the collection (26 pp) and, significantly, a short list of 
relevant web-sites followed by an index.


This collation of papers covers a lot of ground in over 500 pages, so a 
review of any reasonable length will necessarily be selective. There 
are numerous features that make the book easily accessible and 
thoroughly rewarding to read. Compendia of this kind with so many 
contributors are often disjointed with very little uniformity from chapter 
to chapter in terms of theme and style. This is not the case here.  The 
theme and style are surprisingly consistent and the editors' 
introductions to each chapter contribute to the cohesive whole. There 
are also many cross-references between chapters, allowing the 
editors to build upon the foundations of other contributors' work and, 
therefore, eliminate redundancies.  Some areas that might be 
considered central to corpus linguistics are missing from the contents: 
the readings do not include work on learner corpora, corpus-based 
teaching material, or how corpora can be used by language learners 
themselves. But the editors argue, and I agree, that 
compartmentalization of the volume by topic would serve to make the 
readings less accessible to newcomers. This collection of papers 
shows just how much corpus linguistics has evolved as an activity, 
rather than a broader guide outlining practical applications for 
language teaching. This is a welcome focus, and probably makes the 
readings a stronger collection than if they had attempted to include 
everything in the field.

The volume shows just how very diverse and complex corpus based 
research can be. The contents range from the earliest contribution, 
which deals with corpora used to describe the structure of English, 
painstakingly taken (B.C) from 250, 000 words of telephone 
conversation.  To a later paper that challenges generative linguists' 
claims that we have a system of rules in our heads distilled from 
experience. Rather, Bod and Scha (1996) propose, human language 
users have a *corpus* in their heads derived from a lifetime's 
exposure to language. Ranging more recently to Sampson's (1999) 
own contribution highlighting how grammatical complexity continues 
throughout life and well beyond the alleged 'critical period', at around 
the time of puberty, supported by evidence from the CHRISTINE 
corpus. Such examples are useful to indicate how broad the readings' 
coverage is and not to show that the book consistently argues in 
favour of corpus based research over generative linguists' intuitions. 

One should not, then, expect this book to challenge generative 
linguistics from the standpoint of corpus linguistic investigation. It is not 
a popularising work directed towards converting the world of 
generative linguistics to corpus based methods. That said, the editors 
do point out that empirical evidence reveal patterns that are actually in 
use quite heavily when generative linguists' intuitions suggest they are 
not. As such, the examples cited throughout the book provide some 
very concrete data and provocative arguments. Nevertheless, it 
appears unlikely that corpora will ever be used very widely by 
generative grammarians. This, in spite of the fact that some 
generative discussions of language have been based on corpora, and 
have demonstrated potential for advancing generative theory.  
Corpora may well yet prove to be an excellent source for verifying 
linguistic hypotheses.

Overall, the book is an extremely valuable resource on its own, not 
only for corpus linguists as a valuable reference. Those newly 
interested in the area will also find the volume an essential collection, 
not least to understand the wider field of corpus linguistics and the 
historical developments it has undergone.  The richness of the book is 
the editors' vast collective experience and knowledge in presenting 
the development in terms of linguistic research. A strong feature of the 
book is the inclusion of many useful figures and tables that serve to 
capture the research findings in a concrete manner for the reader. 
This excellent book should be required reading for students and 
teachers involved in corpus-based research and will be generally 
useful to anyone who seeks a more comprehensive understanding of 
the resurgence of corpus-based linguistics. This is an impressive 
volume that demonstrates just how far the field has progressed over 
the last 50 years.

[The 2004 edition of this book was reviewed in 
http://linguistlist.org/issues/16/16-98.html --Eds.] 


Jonathan Clenton teaches English and corpus linguistics at Osaka 
University's Graduate School of Language and Culture, Japan. His 
current research focuses on developmental work on vocabulary 

LINGUIST List: Vol-17-1312	


More information about the Linguist mailing list