37.1118, Reviews: Applying Corpus Linguistics to Illness and Healthcare: Elena Semino; Paul Baker; Gavin Brookes; Luke Collins; Tony McEnery (2025)

Wed Mar 18 19:05:02 UTC 2026

LINGUIST List: Vol-37-1118. Wed Mar 18 2026. ISSN: 1069 - 4875.

Subject: 37.1118, Reviews: Applying Corpus Linguistics to Illness and Healthcare: Elena Semino; Paul Baker; Gavin Brookes; Luke Collins; Tony McEnery (2025)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Helen Aristar-Dry <hdry at linguistlist.org>

================================================================

Date: 18-Mar-2026
From: Anastasiia Petrenko [ap2315 at cam.ac.uk]
Subject: Elena Semino; Paul Baker; Gavin Brookes; Luke Collins; Tony McEnery (2025)

Book announced at https://linguistlist.org/issues/36-2826

Title: Applying Corpus Linguistics to Illness and Healthcare
Publication Year: 2025

Publisher: Cambridge University Press
           http://www.cambridge.org/linguistics
Book URL:
https://www.cambridge.org/ch/universitypress/subjects/languages-linguistics/applied-linguistics-and-second-language-acquisition/applying-corpus-linguistics-illness-and-healthcare?format=PB

Author(s): Elena Semino; Paul Baker; Gavin Brookes; Luke Collins; Tony
McEnery

Reviewer: Anastasiia Petrenko

SUMMARY
This book, “Applying Corpus Linguistics to Illness and Healthcare” by
Elena Semino, Paul Baker, Gavin Brookes, Luke Collins and Tony McEnery
is a recent and substantial contribution to applied corpus
linguistics, offering a comprehensive methodological and analytical
guide to corpus-based research in the domain of illness and
healthcare.
The book consists of thirteen chapters, which can be grouped into
three main clusters: (i) foundational steps of corpus-based research,
including research design, data collection and ethics; (ii)
methodological and analytical challenges specific to
healthcare-related discourse based on a number of case studies; and
(iii) dissemination of findings in society and future directions for
corpus-based research. Taken together, these chapters provide a
coherent progression from the initial formulation of research
questions and compilation of datasets to the interpretation,
application and public communication of research findings.
Chapters 1-4: Laying the groundwork for corpus-based research
The opening chapters introduce readers to the fundamentals of corpus
linguistics. Chapter 1 introduces key principles, terminology and
analytical tools, including frequency lists, concordances,
collocational analysis and keyword analysis, illustrated through
examples that are developed in later chapters. The authors also
provide a clear overview of available corpus resources and free
analytical tools, explaining their functions. This chapter is
particularly effective in making the book accessible to a wide
audience: novice researchers benefit from the step-by-step guidance,
while more experienced scholars can selectively consult chapters
relevant to their interests and broaden their horizons with regard to
the issues in healthcare that can be examined via corpora. Each
chapter concludes with a carefully curated reference list, enabling
readers to pursue more advanced methodological or theoretical sources.
Chapter 2 explains different ways in which research questions can be
formulated. The authors outline several possible trajectories and
encourage readers to be flexible: (i) research questions may be
predefined by institutional partners, (ii) be motivated by the
researcher’s linguistic intuition, (iii) be refined after data
analysis, or (iv) be revised, with some initial research questions
being excluded and new ones being added, when corpus data prove
unsuitable for addressing particular issues. This discussion of
methodological difficulties has relevance beyond corpus linguistics,
offering valuable methodological insights for research in the
humanities.
Chapter 3 addresses data collection, with particular attention to the
specific challenges posed by healthcare-related research. The authors
argue that such studies often require purpose-built corpora rather
than reliance on available pre-existing large corpora. The reader is
provided with three different case studies which offer a clear overall
picture of source types and are sufficient for the majority of corpus
studies. The first case study sheds light on compiling a news corpus
about obesity, highlighting the importance of taking into account (i)
types of newspapers; (ii) the number of articles and words they
contribute to the whole dataset and (iii) the month in which an
article was published, as the researchers identified a strong
correlation between January when New Year resolutions are made and the
number of articles published. The second case study covers work with
oral resources, such as interactions that occurred in the Emergency
Department in Australian hospitals. The section guides the reader
through the transcription procedure, essential metadata and the
representation of the collected data in a searchable format. The third
case study is based on building the Victorian Anti-Vaccination
Discourse Corpus from text archives spanning between 1854 and 1906.
The authors describe not only the process of such corpus compilation
but also some potential pitfalls of working with historical sources,
such as text length variability, uneven temporal distribution, and
anonymity, which can be beneficial for consideration in corpus
linguistics as well as sociolinguistics and historical linguistics
research.
Chapter 4 is dedicated to ethics. The researchers highlight the
general principles of compiling the data and working with the process
of anonymisation, with particular attention paid to the data retrieved
from online forums and articles as they can be easily traced and
identified by other users of the dataset.
Chapters 5-11: Analytical challenges and methodological solutions
The central chapters provide the reader with a range of analytical
challenges specific to healthcare discourse. Chapter 5 examines
discussions on cancer presented on online forums where instances of
humour and metaphors were unexpectedly identified. This chapter not
only offers a strong theoretical grounding in topic sensitivity but
also presents detailed linguistic and conversational analyses. The
breadth of material covered and references mentioned make this and the
following chapters an outstanding example not only for undergraduates
and postgraduates conducting research for degree dissertations but
also for PhDs and early career researchers aiming to publish their
research papers in well-established journals. The chapter is also
supplemented with figures and tables, which make the organisation of
the material more coherent and accessible.
Chapter 6 deals with challenges posed by the topic of identity and the
applicable approaches to its investigation. Again, the chapter opens
up with an overarching theoretical introduction to the definition of
identity in social studies and its importance for such studies. By
bringing in the England’s Cancer Patient Experience Survey as an
example of the form in which the corpus data in healthcare can be
collected, researchers underline the different forms of quantitative
and qualitative feedback it can provide the researchers with. Later,
they elaborate on the issue that not all surveys allow for identifying
a particular sex, as they allow for the option “prefer not to say”.
However, the book provides the reader with alternatives that can be
implemented, e.g. how the context can be taken into account. The
chapter concludes with valuable findings on the trends in style,
linguistic strategies and language use peculiar to males and females
in the way they left their feedback. For example, females left more
personalised responses, which included references to particular
clinicians, a more diverse range of pronouns used and a higher number
of ‘lovely’, ‘nice’, ‘wonderful’, and ‘amazing’ in comparison to males
(Semino et al. 2025, pp. 89-97). At the same time, males made more
frequent use of words identifying time (e.g. ‘months’, ‘period’),
referred to the performance of few members of staff as to the quality
of the entire hospital or the healthcare system, as well as they
utilised ‘you’ in terms of pronouns. There also were identified
age-related differences in response strategies among males only. Such
a detailed elaboration on the findings’ interpretation can be
beneficial for social studies and serve as a compelling example of how
the findings of the corpus data are implemented in real practice, in
this case, for evaluating the quality of medical services.
Chapters 7 and 8 outline a well-elaborated comparison of how corpus
research based on relatively synchronic and historical data should be
organised. Chapter 7 focuses on studies that include healthcare
material spanning from 2008 to 2017 (when all case studies are taken
into account). The chapter starts by sketching out a set of major
potential change trends in English and explains that a period-based
investigation can be particularly beneficial as it is likely to reveal
emerging linguistic patterns that can indicate new ways of
conceptualising health conditions and changing representations of
health phenomena ‘in newspapers, … patient feedback over time, … on an
online forum’ (ibid., p. 101). The introduction to the chapter not
only provides the readers of different levels of corpus knowledge with
essential theoretical background but also shows practical
applicability both within and beyond academia. Throughout the chapter,
the reader is guided through every research stage with valuable
advice, explanations of the statistical methods applied, and
illustrative examples to support the line of argumentation, with the
results being summarised and presented in the form of charts and
tables. Moreover, the researchers pay special attention to explaining
and elaborating on different methodological techniques and approaches.
For example, in Section 7.4 (ibid., p.114), they push the boundaries
and draw the reader’s attention to the fact that the outline of the
research can substantially change over time and new emerging details
can be taken on board. They justify it by referring to an example of
an online forum. They explain that initially they aimed to conduct
concordance and collocational analyses with the focus on age-connected
differences in style and commenting. But the topic of ‘the change over
time’ around which the chapter is centred may refer not only to users
in terms of age differences but also to an individual, that is, the
rhetoric of a particular individual may change from the first post on
the forum to their 60th post or to the last post on the forum. The
results shed much light on how the conceptualisation of anxiety
changed, and how this journey changed from the first post, in which
people sought advice, to the last post in which they use ‘learned’
(past tense) and provide the summary of their journey (ibid.,
pp.115-116). However, what remains unclear in this study (from the
reviewer’s perspective), and what might be articulated as the future
trajectory, is whether there were any differences between the first
post on the online forum of an individual in their 20s and in their
60s and how these differences, if present, unfolded from the first
post to their last post on the forum. It is not really clear why the
age parameter was excluded from the study at the second stage.
Chapter 8 turns to the investigation of ‘the change over time’ using
historical corpora. The authors conduct and describe two case studies
– vaccination attitudes and sexually transmitted diseases – the data
for which were collected on the material from 1854 to 1906 for the
former and before 1700 for the latter. The authors explain the
temporal cut-off points by relevant historical context (a Vaccination
Act was enacted in 1907). They also introduce a valuable technique for
working with the material, which can be distributed unevenly over the
historical period in question, against the reference corpus from the
same historical period, which will minimise the influence of the
choice of the material on the objectivity of the results. Moreover,
the researchers draw parallels between the debates on vaccination in
the 19th century and debates during the COVID pandemic, showing
valuable insights for comparative discourse studies. Using the second
case study, researchers again provide the reader with valuable
comments and show the limits of relying on frequency lists alone as
one of the names of the disease did not refer to the disease directly
but served as a widespread swear word at the time, a problem which was
resolved by applying a collocate analysis. Moreover, a procedure for
identifying a disease which was described indirectly in the text
rather than named straightforwardly was proposed (ibid., p. 128). All
these recommendations and detailed explanations on the resolution of
every problem,  from poorly  recognised digital texts to the polysemy
of words and ambiguity of names, encourage the reader to adopt a
holistic research approach and provide accessible analytical tools.
In Chapter 9, the representations of cancer and anxiety are analysed.
Here, researchers base their case studies on keywords and collocations
to shed light on people’s different understandings of their condition.
They stress the importance of applying consistent and transparent
guidelines (ibid., p. 138), underlining that abstract concepts, which
are feelings, can be interpreted subjectively and can have more than
one representation. They also draw the reader’s attention to negative
forms that might not appear in the frequency lists but that will
introduce  a completely opposite meaning, as in the following example:
‘I don’t have bad anxiety’ (ibid., p. 138), which may nevertheless
contribute to the frequency of the collocation ‘bad anxiety’.
Moreover, researchers push the boundaries and stress how often
metaphorical devices in the form of nouns, verbs and noun phrases can
be invoked, with their meanings identified through context (ibid.,
pp.139-143). At the heart of the second case study on cancer,
metaphors are also identified. The researchers then elaborate on
whether such metaphors should be eliminated from communication about
cancer. As a result, the case study shows that it is the patients, not
family carers or healthcare professionals, who use violence metaphors
more often. They are used to share different hues of patients’
experience of illness and treatment, which highlights the importance
of using metaphorical devices. Despite the seriousness of the topic,
the researchers were surprised to identify multiple cases of humour,
which opened a new subtrack of the study. As a result, researchers
draw the conclusion that the violence metaphors in the domain of
cancer can be both empowering and disempowering and shed light on
patients’ conceptualisation of their condition.
Chapter 10 expands the scope of social actors and agency, examining
discourse surrounding obesity and psychosis across GPs, patients,
media and governments. Researchers guide the reader through the
application of collocational methods, argue for the utility of ChatGPT
and ‘the remainder method’, paying particular attention to nomination
and predication strategies. Again, the utility of concordance lines
and extended context proves beneficial for identifying collocates and
the range in which the notion can be represented. Another case study
presented in this chapter focuses on psychosis and the ways in which
voices can be personified and represented as social agents. Here, the
ways vary significantly from minimal personification (‘like a person’,
(ibid., p. 161)) to non-human (‘demon, birds, bomb’) or abstract
entities (‘thoughts, scenario, sensation’ (ibid., p.161)). This shifts
the focus of the study to the degree of agency and takes it beyond the
grammatical constraints, which not only develops qualitative semantic
research but also enhances the overall understanding of the ways in
which agency can be encoded and represented in the language.
Chapter 11 investigates another important issue not only within
linguistics but also within social studies – legitimation. First of
all, the chapter elaborates on different approaches to it, providing a
strong theoretical background; then it transitions to the application
of corpus linguistics methods to studying the forms and degrees of it.
To test the assumptions, researchers conduct a case study on the
material of the vaccine hesitancy debate on Mumsnet and identify
different ways in which posters utilise authorisation and
rationalisation strategies to justify their positions on vaccines.
Then, the topic of legitimation is studied in the framework of a case
study of patient evaluations of healthcare services in which patients
submit their negative feedback assessments, justifying their comments
by either their past life experience, experience with the practice,
conscientiousness in terms of using the medical services and even
their physical characteristics to justify their intolerance to the
experienced pain. As a result, legitimation in healthcare proves to
depend not only on linguistic means of expression but also on
extralinguistic factors such as personal narratives and
self-representation of the poster.
Chapters 12-13: What is next?
Chapter 12 focuses on the dissemination of the findings not only
through journal articles (predominantly, for academics) but also
through interactions with medical staff and institutions and broader
outreach to the public through interviews and news reports. The
researchers openly share their experience in all the three domains,
providing the reader with a clear sequence of steps they might need to
follow to submit  to (and sometimes adapt their articles for)
different types of journals. They also openly share information about
different obstacles, miscommunication cases with stakeholders and
press and possible ways to address such challenges. Moreover, they
also encourage readers not to be afraid of the changes that occur on
the journey of conducting research and tell the story of an  outcome
of one of the projects that was not planned, e.g. ‘The Metaphor Menu
for People Living with Cancer’ that provides ‘a collection of 17
different metaphors for cancer, accompanied by images’ (ibid., p.
197), which might shed light on the conceptualisation of the illness
and be seen as recommendations to be used in many healthcare
discourses, with a disclaimer that every metaphor is heavily
context-sensitive and context-dependent.
Chapter 13 summarises the main outcomes of the book and opens further
avenues for future research. The authors highlight the fact that the
main advantage of applying corpus data is to identify patterns in
language use that have implications beyond linguistics, for example in
reducing stigma or improving patient-practitioner communication.
EVALUATION
This volume provides an exceptionally clear and well-structured
methodological guide to corpus-based research in healthcare contexts.
>From the beginning, the reader is provided with the bigger picture of
the main approaches and terms in corpus linguistics. The authors also
focus on ethical considerations, including data collection,
anonymisation, and collaboration with project partners in sensitive
healthcare settings. The reader is encouraged to examine a wide range
of healthcare-related topics both synchronically and diachronically,
with particular attention paid to cancer, anxiety, obesity, humour,
metaphor use, identity construction, social roles and legitimation.
All methodological recommendations are consistently supported by
extended case studies, illustrative extracts from relevant corpora,
and a rich array of figures and tables. This makes the analytical
procedures transparent and the empirical findings accessible to a wide
readership, ranging from novice students to senior researchers. In
this respect, the first four chapters are especially valuable for
readers new to corpus linguistics, as they provide a structured
introduction to key concepts and tools. However, the scope of the
volume extends far beyond the needs of undergraduate or postgraduate
audiences alone. Rather, it serves as a ground-breaking manual that
(i) comprises main methods and terminology in corpus linguistics and
(ii) demonstrates how corpus-linguistic approaches can be productively
integrated with social and healthcare research. Chapters 12 and 13,
which focus on dissemination and outreach, are particularly unique
within the existing literature. The degree of openness with which the
research team reflects on practical challenges, institutional
constraints and publication-related difficulties is striking. These
chapters will be of particular value to professionals and, especially,
early career researchers seeking guidance on navigating the academic
publishing landscape and on communicating research findings beyond
academia. More generally, the volume is designed in such a way that
readers can read it sequentially or refer to individual chapters based
on their specific research interests. The detailed index on pp.
214-215 (ibid.) adds further value to the book’s usability.
The scholarly value of this volume lies in its interdisciplinary
nature. By bringing together corpus linguistics and social studies,
the book makes a substantial contribution to expanding the reach of
healthcare research beyond academia. The breadth of material analysed
– ranging from disease conceptualisation and representation to
identity, social roles, and legitimation – offers a comprehensive
picture of how healthcare discourse can be systematically explored
through corpora. Each chapter features a concise yet informative
theoretical introduction, drawing not only on corpus linguistics but
also on insights from historical linguistics, semantics, pragmatics,
and discourse analysis. The extensive use of charts, tables, and
figures further enhances the accessibility of this diverse and
interdisciplinary material.
In this respect, the volume can be seen as complementing and extending
established handbooks in corpus linguistics, such as McEnery & Hardie
(2012), Love (2020), Di Cristofaro (2023), Meyer (2023), Garofalo
(2024), Fusari (2025), while simultaneously building on recent
corpus-assisted discourse-analytic work in health communication and
metaphor studies (Semino et al. 2018, Baker et al. 2019, Brookes 2021,
Baker 2022, Collins 2023). What distinguishes the present volume is
the fact that it brings together, under one cover, methodological
guidance and a wide range of empirical healthcare studies, something
which, to my knowledge, has not previously been achieved in such a
systematic manner.
Some limitations are, however, inevitable, many of which are
explicitly acknowledged by the authors themselves throughout the book
and in Chapter 13. First, not all studies are driven by clearly
predefined research questions established prior to data analysis.
While this may initially appear unsettling to novice researchers, the
authors convincingly argue for the exploratory and flexible nature of
corpus-based research and provide practical guidance on navigating
such ‘turbulent’ analytical trajectories. Secondly, issues of
anonymity mean that not all corpora include sufficiently rich
metadata, which at times restricts analyses related to variables such
as gender or age. In addition, although automated tools offer
promising support for corpus annotation, they remain limited in
accuracy and reliability, necessitating time-consuming manual tagging
procedures. Importantly, these challenges are not downplayed; rather,
they are openly discussed, and readers are offered realistic
strategies for addressing them.
Also, there are two technical issues that can be considered in further
editions. First, the footnote and referencing system lacks
consistency. In some cases, the online resources are cited in
footnotes (e.g. Section 13.5), while in others they are embedded in
the main text (e.g. Section 12.3.3). Similarly, although some online
resources are included in the reference lists (e.g. Chapters 11 and
13), others are omitted, such as the reference cited on p. 144 (ibid.)
in Chapter 9, just to name one. Secondly, there are several
typographical errors in the book, e.g., on pp. 12, 57, 61, 67 and 188
(ibid.). Finally, the quality of some figures (ibid., pp. 6, 22 and
42) is not the best and it is difficult to read the information in
them.
In sum, this volume is a brand-new and indispensable source for
everyone interested in corpus linguistic research. This edition has
met my expectations. It offers valuable insights into corpus
compilation, analysis, and interpretation, and it demonstrates how
empirical findings can be articulated both theoretically and in terms
of practical applicability. By addressing healthcare discourse from
both synchronic and diachronic perspectives, the book makes a
substantial contribution not only to corpus linguistics but also to
healthcare studies more broadly. Whether for novice readers or
experienced researchers, this book will be of lasting value.
REFERENCES
Baker, Paul. 2022. Analysing Language, Sex and Age in a Corpus of
Patient Feedback: A Comparison of Approaches. Cambridge: Cambridge
University Press. https://doi.org/10.1017/9781009031042
Baker, Paul; Brookes, Gavin, & Evans, Craig. 2019. The language of
patient feedback : a corpus linguistic study of online health
communication. London: Routledge.
https://doi.org/10.4324/9780429259265
Brookes, Gavin & Baker, Paul. 2021. Obesity in the news : language and
representation in the press. Cambridge: Cambridge University Press.
Collins, Luke & Baker, Paul. 2023. Language, discourse and anxiety.
Cambridge: Cambridge University Press.
Di Cristofaro, Matteo. 2023. Corpus approaches to language in social
media. London: Routledge. https://doi.org/10.4324/9781003225218
Fusari, Sabrina. 2025. A Corpus Linguistic Approach to Analyzing
“Empathy.” London: Routledge. https://doi.org/10.4324/9781003632399
Garofalo, Giovanni & Maci, Stefania. 2024. Investigating discourse and
texts through Corpus-Assisted Discourse Studies (CADS). Lausanne:
Peter Lang.
Love, Robbie. 2020. Overcoming Challenges in Corpus Construction : The
Spoken British National Corpus 2014. London: Routledge.
https://doi.org/10.4324/9780429429811
McEnery, Tony, & Hardie, Andrew. 2012. Corpus linguistics : method,
theory and practice. Cambridge: Cambridge University Press.
Meyer, Charles F. 2023. English corpus linguistics : an introduction.
Cambridge: Cambridge University Press.
Semino, Elena; Baker, Paul; Brookes, Gavin; Collins, Luke & McEnery,
Tony. 2025. Applying corpus linguistics to illness and healthcare.
Cambridge: Cambridge University Press.
Semino, Elena; Demjen, Zsofia; Hardie, Andrew; Payne, Sheila & Rayson,
Paul. 2018. Metaphor, Cancer and the End of Life: A Corpus-Based
Study. London: Routledge.
ABOUT THE REVIEWER
Anastasiia Petrenko is a PhD Candidate in Theoretical and Applied
Linguistics at the University of Cambridge where she writes a thesis
on the concept of time and temporal adverbs in different languages and
teaches semantics and pragmatics to undergraduate students.
Anastasiia’s research interests combine semantic and pragmatic
ambiguities, corpus studies, discourse analysis and cross-linguistic
variation.

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com

SIL International Publications http://www.sil.org/resources/publications

----------------------------------------------------------
LINGUIST List: Vol-37-1118
----------------------------------------------------------