35.1160, Review: Corpus-Assisted Discourse Studies: Gillings, Mautner, Baker (2023)

Mon Apr 8 13:05:02 UTC 2024

LINGUIST List: Vol-35-1160. Mon Apr 08 2024. ISSN: 1069 - 4875.

Subject: 35.1160, Review: Corpus-Assisted Discourse Studies: Gillings, Mautner, Baker (2023)

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>

LINGUIST List is hosted by Indiana University College of Arts and Sciences.
================================================================

Date: 08-Apr-2024
From: Aleksandra Uttenweiler [aleksandra.uttenweiler at gmail.com]
Subject: Applied Linguistics: Gillings, Mautner, Baker (2023)

Book announced at https://linguistlist.org/issues/34.2968

AUTHOR: Mathew Gillings
AUTHOR: Gerlinde Mautner
AUTHOR: Paul Baker
TITLE: Corpus-Assisted Discourse Studies
SERIES TITLE: Elements in Corpus Linguistics
PUBLISHER: Cambridge University Press
YEAR: 2023

REVIEWER: Aleksandra Uttenweiler

SUMMARY

‘Corpus-Assisted Discourse Studies’ is part of the ‘Elements in Corpus
Linguistics’ series published by Cambridge University Press and edited
by Susan Hunston. This series provides compact introductions to the
main areas of the field. In their element, Gillings, Mautner, and
Baker focus on the use of corpus tools for discourse studies aiming at
researchers interested in working in this area. The book comprises
seven sections that provide an overview of methods in Corpus-Assisted
Discourse Studies (CADS) and reflect on their practical application.

The first section opens with a brief definition of CADS and identifies
the target audience as master’s or PhD students and teachers who wish
to introduce their students to the method. While the authors note that
the term ‘discourse’ is a “notoriously fuzzy notion” (p.1), they use
it in a broad sense to include all naturally occurring longer
stretches of language that perform social functions. Similarly, CADS
is also broadly defined: it refers to a research approach that focuses
on social phenomena, rather than purely linguistic ones.

The aim of this element is to provide a ‘how-to’ guide (p.2) for CADS,
explaining the implementation of corpus tools in discourse studies,
and critically reflecting on the methodology. The book is structured
according to these goals, guiding the reader through the process of a
model CADS study in Sections 2-5, followed by a reflective part in
Sections 6 and 7. The authors acknowledge potential shortcomings of
the book, highlighting the brief nature of Cambridge Elements and the
spatial limitations it imposes. They also transparently explain their
focus on the English language and the tradition of British
Linguistics.

Section 2 discusses the advantages of using corpora in discourse
studies. It begins with a historical note on CADS. The authors
emphasize meta discussions around CADS as a methodology. They briefly
consider the distinction between ‘corpus-based’ and ‘corpus-driven’,
as well as other labels ascribed to corpus linguistic approaches
applied in DS, opting for the use of the label ‘corpus-assisted’ as an
“umbrella term” (p.5). The section discusses limitations of
traditional DS methods, particularly their poor scalability due to
close reading. It then highlights the advantages of using corpus
tools, which offer better scalability and focus on uncovering and
systematically describing patterns. These patterns are then
interpreted using qualitative discourse analytical approaches, taking
into account the sociopolitical context in which they appear. The main
advantage of CADS is, as stated by the authors, achieving “a useful
synergy between CL [Corpus Linguistics] and DS [Discourse Studies]”
(p.7) through triangulation, the combining of corpus tools with
discourse analytical methods.

Section 3 begins the practical ‘how-to’ part by discussing corpus
building. The authors argue for a definition of corpus including
representativeness of a particular language variety. The discussion is
divided between two forms of corpora: reference corpora and
specialised corpora. The first part critically examines reference
corpora, questioning if they are in fact representative of the
language variety they aim to depict, particularly large English
corpora like COCA or BNC2014. The second part considers self-compiled
specialized corpora and how to select data necessary for answering a
specific research question. It provides some key questions to consider
while building or choosing a corpus, including ethical concerns,
working with oral texts, the size of a self-compiled corpus, and
inclusion of markup.

Section 4 introduces key corpus tools that are useful for CADS. For a
comprehensive description, each of the subsections is devoted to one
tool: frequency, concordance analysis, collocation analysis, and
keyword analysis. The frequency subsection covers creating wordlists,
various techniques for calculating word frequencies, exploring
frequent n-grams, visualizing dispersion, and the distinction between
raw and relative frequency. The keyword analysis subsection emphasizes
comparing corpora. In addition, the subsections on corpus tools cover
basic terms that are useful for preparing data, such as tokenization,
grammatical part-of-speech tagging, and semantic tagging. The
importance of co-text is also highlighted, along with potential
problems that may arise if the amount of co-text provided in a corpus
program is insufficient for interpretation.

While the use of these established tools has been described before in
textbooks on Corpus Linguistics (e.g. Stefanowitsch, 2020), the
authors focus on their use for discourse studies. They provide
multiple examples of how each tool has been used in previous research,
demonstrating various ways in which corpus methods can be implemented.
Additionally, they indicate which corpus software facilitates each
tool.

After introducing corpus tools useful for CADS on each own, Section 5
demonstrates an example of how they can be used together in an ongoing
project analyzing court dissents. The process begins by identifying
‘lexical hooks’, specific lexical elements that can be used in the
query of a corpus program. Concordance and collocation analyses can
then be used to uncover more interesting politeness markers. The
example proves that not all previously described tools are always
useful and some of them may lead to irrelevant findings. The authors
emphasize the importance of balancing orderly progression with the
messiness of the analytical process. However, they note that this is
only an example. Other projects may use the tools in a different way
and order.

While the previous sections offer a practical account of conducting
CADS research, from selecting a corpus to interpreting findings, the
final two sections critically reflect on the methodology. Section 6
outlines the most important limitations and pitfalls that need to be
considered when undertaking a CADS project. Here, the authors discuss
cases, where CADS may not be the optimal choice and some of the best
practices for working with corpora. The second subsection addresses
potential challenges that may arise at different stages of the
research process.

Section 7 reflects further on triangulation at the centre of CADS. The
authors also discuss the future of CADS, including new ways of
measuring keyness and categorizing keywords. Furthermore, they provide
some insight into work that has been done in the field on languages
other than English. Another aspect the authors also address is
accessibility of corpus research methods, literature, and skills, as
well as political limitations. The final subsection reflects on
dealing with messiness in CADS, which is inevitable as shown in
Section 5. The authors suggest that the best way to deal with this
mess is to accept it through flexibility, non-linearity, and finding
balance in a well-reflected and protocolled process.

The element closes with a useful appendix that briefly describes some
of the most common corpus programs.

EVALUATION

While the book is aimed at researchers and students new to the use of
corpus linguistic tools in discourse studies, it is assumed that the
readers already have some knowledge of traditional discourse studies
methods and a basic understanding of Corpus Linguistics. Therefore, it
is best suited for readers who already possess theoretical knowledge
in both fields but want to develop practical skills in using corpus
tools and programs in a combined approach. Regarding the systematic
use of demonstrated tools, it can be confusing for early-career
researchers to understand the rationale for applying the tools in a
specific way and order that is suited for their own research. As the
authors note when writing about the order in which they apply the
tools “[t]here is nothing canonical about these choices, but
experience suggested that they would be reasonable for the project at
hand” (p.40). The choices concerning the use of tools seem intuitive
in the provided examples, and it would be beneficial to suggest a
starting point for those lacking experience.

A broad definition of discourse makes the element applicable to a wide
range of uses in discourse studies. However, concerning the definition
of CADS, the methodology seems to have a more limited scope than
anticipated. The authors define CADS projects as having “a social
question at their center rather than a purely linguistic one” (p.1),
especially concerning power structures and social hierarchies tied to
linguistic choices. In the present state of research this is not
necessarily true; and it is inconsistent with the previously stated
broad definition of ‘discourse’ (see Flowerdew 2023: 126–127). In
fact, the demo project discussed in Section 5 has a linguistic
question at its core (what role “language plays in ‘doing’ dissent’”
p.39) and is only informed by the social context. Although the
definition of CADS theoretically limits the scope of the element, the
tools and methods presented can still be applied to a broad range of
research projects, not necessarily those focusing on social questions.

Considering the space limitations, it is worth questioning the
inclusion of certain topics and aspects of CADS that might be less
relevant to the overarching scope of the element, for instance, the
discussion of representativeness of large reference corpora in Section
3 or the discussion of Critical Discourse Analysis in Section 2. These
are important aspects that should be discussed in a full-length book.
However, given the practical aim of the element, it would be more
advantageous to use the space to address methodological questions in
more depth, for example, elaborating on the label 'corpus-assisted'.

The element builds on previous work on technical aspects of CADS, most
importantly Baker 2006. While it covers the same topics and tools as
the Baker’s monograph, it is more up to date given the developments in
corpus tools availability and technical solutions. By providing many
examples from recent CADS research, the presented solutions are
applicable to various research questions. The authors have bridged the
17-year gap between both publications, by providing an engaging and
concise introduction, especially to readers new to CADS. The text
introduces the technicalities of CADS and invites readers to reflect
on the possibilities, potential, and limitations of the methodology,
regarding both the theory (by reflecting on what approach to discourse
we take) and the praxis (by reflecting on balance between close
reading of discursive texts and the use of statistical tools).

Most importantly, readers should engage more with the question of how
they define ‘discourse’ and CADS for the purpose of their research, as
well as what approach they understand under ‘corpus-assisted’. The
broadness of these terms, as demonstrated, is advantageous in the case
of the compact element, bringing together various perspectives and
approaches and making the demonstrated tools accessible to a broad
audience. Considering this, the element provides a comprehensive
‘how-to’ CADS, much needed given the developments in the past years,
bridging the gap in the literature on methodology. An important
advantage of the book are the extensive and transparent reflections on
best practices in research tied to CADS. Therefore, the element is a
great starting point for everyone interested in engaging not only with
the practical use of corpus tools for discourse studies but also with
broader theoretical reflections on the methodology.

REFERENCES

Baker, Paul. 2006. Using corpora in discourse analysis (Continuum
Discourse Series). London, New York: Continuum.

Flowerdew, Lynne. 2023. Corpus-based discourse analysis. In Michael
Handford & James Paul Gee (eds.), The Routledge Handbook of Discourse
Analysis. 2nd edn. 126-138: Routledge.

Stefanowitsch, Anatol. 2020. Corpus linguistics: A guide to the
methodology. Zenodo.

ABOUT THE REVIEWER

Aleksandra Uttenweiler holds a master's degree in German Linguistics
from Leipzig University. She is a PhD student at Leipzig University
and Leiden University. Her research interests include Positioning
Theory, Discourse Analysis, and Corpus Pragmatics.

------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html

LINGUIST List is supported by the following publishers:

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-1160
----------------------------------------------------------