37.1219, Reviews: Analysing Sociolinguistic Variation: Sali A. Tagliamonte (2025)

Wed Mar 25 22:05:02 UTC 2026

LINGUIST List: Vol-37-1219. Wed Mar 25 2026. ISSN: 1069 - 4875.

Subject: 37.1219, Reviews: Analysing Sociolinguistic Variation: Sali A. Tagliamonte (2025)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Helen Aristar-Dry <hdry at linguistlist.org>

================================================================

Date: 25-Mar-2026
From: Victoria Beatrix Fendel [vbmf2 at cantab.ac.uk]
Subject: Sali A. Tagliamonte (2025)

Book announced at https://linguistlist.org/issues/36-2509

Title: Analysing Sociolinguistic Variation
Publication Year: 2025

Publisher: Cambridge University Press
           http://www.cambridge.org/linguistics
Book URL:
https://www.cambridge.org/universitypress/subjects/languages-linguistics/sociolinguistics/analysing-sociolinguistic-variation-2nd-edition?format=PB&isbn=9781009403108

Author(s): Sali A. Tagliamonte

Reviewer: Victoria Beatrix Fendel

SUMMARY
Tagliamonte’s Analysing Sociolinguistic Variation consists of a
preface, thirteen chapters, a list of references, and a subject index.
Each chapter contains note boxes (in grey) providing examples, tips
and tricks, and experience reports, thus making the content more
tangible and memorable; it also includes exercises at the end, which
if followed throughout the book lead the learner from data collection
to the writing of a research paper. Online resources are provided with
the book here:
https://www.cambridge.org/ag/universitypress/subjects/languages-linguistics/sociolinguistics/analysing-sociolinguistic-variation-2nd-edition?format=PB&isbn=9781009403108#resources.
These are datasets used throughout the book which the reader can
utilise to replicate results or explore the variables in the datasets
further. The book is intended to be “a manual which will take you
through a variationist analysis from beginning to end” (p. 1) and
provides the reader with “the tools you need to do it yourself” (p.
334). These goals are well achieved.
Chapter 1 is the introduction and situates variationist
sociolinguistics within the larger fields of linguistics (with a focus
on the “rules” of a language) and sociolinguistics (with a focus on
the context of use) (pp. 2–5). The chapter introduces the three key
notions that variationist sociolinguistics is based on, i.e. orderly
heterogeneity, continuous language change, and language as
communicating non-linguistic information (pp. 5–7), as well as the key
concepts that are deployed (including the vernacular, the speech
community, the form-function asymmetry, the linguistic variable, the
quantitative method, and the principle of accountability (pp. 7–13)).
Chapter 2 on data collection describes the “best kept secret of
sociolinguistics” (p. 16, also p. 32), that is the fieldwork methods.
It explains the difficulties with random sampling (pp. 18–19 and
21–22) and the ethnographic approach which relies on the researcher
entering social networks of language users whom they want to observe
(pp. 19–21). The chapter then introduces the common technique of
stratified random sampling, i.e. “identif[ying] types of individuals
to be studied” and “seek[ing] out a quota of individuals who fit the
specified categories” before random sampling (pp. 22–25). The chapter
states pointedly that “the study of language … requires stepping out
of the academy and making contact with people who do not share one’s
world view” (p. 24). Common personal associations (e.g. ethnicity) can
mitigate the observer’s paradox (p. 25). The chapter finishes by
unpacking sample design, including the issue to investigate, the
sample size, and the internal sample structure, in order to make it
“defensible, logical, and workable” (p. 30). Ethical concerns include
consent, anonymity, voluntary participation, and access to the
researcher’s findings (p. 30).
Chapter 3 on the sociolinguistic interview describes modular
structuring of questions in an interview with the goal of
“progress[ing] from general, impersonal, non-specific topics and
questions to more specific, personal ones” (pp. 35–38). Generally
speaking, any question “can trigger an emotional response or come
across as offensive (p. 36), and topics that “the individual wants to
talk about” are best (p. 36). Interview questions will need to be
adapted to speaker communities (pp. 38–43). The phrasing of questions,
attention to in-group information, and the natural flow of the
conversation are highlighted as important, as well as the importance
of letting the individual go off topic (p. 43). Finally,
practicalities such as where to position recording equipment (p. 43)
are discussed.
Chapter 4 on data handling shows how to make data “maximally
accessible and useful” (p. 47). It emphasises the importance of a data
management plan which defines how the components of a corpus
(including the recording files, the interview reports, the
transcription files, etc.) link together (p. 48), the relevance of
metadata, and the importance of consistent labelling. It advises the
researcher to keep transcription “workable”, whether with or without
tools like ELAN (p. 51), and to keep transcription protocols regarding
orthographic convention, transcription conventions for e.g.
hesitations, representation of phonetic realisations, and words that
do not appear in dictionaries (pp. 52–59). The chapter draws attention
to concordance programs like ANT CONC (pp. 59–60) and indices (pp.
60–61) for initial data processing. Finally, the chapter points to the
fact that file formats may depreciate over time, something that needs
to be taken into account in order to preserve recorded data
particularly in the long term (p. 63).
Chapter 5 treats the linguistic variable, that is “two or more ways of
saying the same thing” (p. 65), including instances of its
non-occurrence (p. 67); it describes the situation where “different
forms can be used interchangeably in some contexts even though they
may have distinct referential meanings in other contexts” (p. 68). The
chapter explains how to circumscribe the linguistic variable (pp.
68–71), how to select linguistic variables for analysis (pp. 71–78),
and how to circumscribe the context of the variable (pp. 78–85).
Throughout, examples illustrate concepts and contexts. The chapter
highlights how the analyst’s “decision-making process may impose an
analysis on the data” (p. 85), including the decision about the
type-token ratio.  Intra-individual variation is a good guidepost (p.
87).
Chapter 6 on formulating hypotheses and operationalising claims
explains how to systematically assess the dependent variable, i.e. the
variable of interest, in all the contexts where it appears and where
it could have appeared, even if this is specifically complicated for
discourse-pragmatic variables (pp. 90–95). The chapter then introduces
the factor group (or “variable”), which affects whether a variant of
the dependent variable occurs or does not (pp. 95–97). In that sense,
“each factor group can also be thought of as a hypothesis about what
influences the choice process” (p. 95). The hypothesis is subsequently
“operationalised” by devising a test for it (p. 95). The chapter puts
great emphasis on efficient data usage, including coding of variables
and the coding schema (pp. 97–101 and pp. 104–111), and on the
literature review (p. 102), from which observed trends, patterns, and
collocations are good starting points for hypotheses (pp. 103–104).
Chapter 7 on reasons for using statistics explores the basic
assumption that if “grammatical structures incorporate choice as a
basic building block, [that] means that they accept probabilisation”
(p. 113); it further explores the move from the variable rule program
(GOLDVARB), implementing a generalised linear model and providing
statistical significance, relative strength, and constraint ranking of
factors (p. 124), to statistics with R, offering a wider range of
options (pp. 113–126). Sociolinguistic data poses a challenge
statistically speaking as “it is based on language in use, often
highly vernacular registers; and it is badly distributed by nature”
(p. 119). The chapter introduces basic setup and data exploration with
R, including the relevant code (pp. 126–146), although acknowledging
that “despite my best efforts there may still be discrepancies in the
code” (p. 140). A notes column in the spreadsheet is encouraged during
the extraction and coding phases as “relevant notes can be inserted as
guideposts to support and inform data processing at a later stage” (p.
144). The chapter finishes with notes on exclusions, practices of
naming individuals in the data files, namely “true to the place as
well as to the ethnicity of the original name” (p. 147), metadata, and
the lab book (pp. 147–149).
Chapter 8 on distributional analysis explains “pre-statistical
analyses using R” (p. 153) highlighting the transition from the
purpose-built tool for “unbalanced, non-experimental data of most
variationist research” to the multi-purpose R environment (p. 153).
The chapter outlines how to assess the overall distribution of a
variable, distribution types (e.g. contextual distribution vs.
distribution of the environments), the rate of the dependent variable
by independent variables, as well as environments where variables
appear (e.g. genres) (pp. 154–170), showing how to use the relevant R
code on the datasets provided with the book. The chapter pays specific
attention to distribution by community and year of birth (or period of
years of birth) (pp. 170–172). It introduces the cross-tabulations
commonly used in sociolinguistics to examine the intersection of
factor groups in a data set” (p. 173). The chapter then turns to
adjusting factor groups, e.g. by releveling (changing the order of the
categories in a factor group), renaming (e.g. for labels to be more
interpretable), binning (with year of birth in particular), dealing
with categorical individuals (i.e. those who only use one variant),
variants with low token counts, and unknowns (e.g. education) (pp.
176–191).
Chapter 9 on exploratory modelling introduces conditional inference
trees (pp. 193–212) and random forests (pp. 212–223). Conditional
inference tree analysis “reveals how interactions and predicators
operate in tandem”, which is visualised in a tree shape (p. 193).
Branching, or splitting, is done where statistically justified (p.
194). The conditional inference tree needs to be fitted to the data,
balancing accuracy, parsimony, and effect interpretations (pp.
208–209). The chapter shows how “different ctrees offer an alternative
perspective on variable (hwat) [and how] the job of the analyst is
interpretation and explanation” (p. 210). (For Tagliamonte’s hwat data
set, see the data sets provided with the book.) Random forests are
“computationally intensive but high-precision non-parametric
classifier[s]” (p. 212). The method “works through the data set by
trial and error to establish whether a factor group is a useful
predictor of variant choice or not” (p. 212).  For this, the factor
groups have to be factorial (p. 213), unlike for conditional inference
trees (p. 218). Crucially, “no two random forests will be precisely
the same … because they are random” and based on bootstrap sampling
(p. 218). The chapter closes by “advocat[ing] for a comparative
approach, the strategy of triangulating across different tools” (p.
222).
Chapter 10 on mixed effects modelling describes fitting a generalised
linear mixed effects model to the data. While GOLDVARB applied
generalised linear models, mixed models facilitate dealing with sparse
and unevenly distributed data and make it possible to deal with random
effects, such as that of the individual (p. 226) or that of words in
the sample (p. 238). Model building involves pooling “your knowledge
of linguistic and sociolinguistic theory, experience, intuition, and
the understanding gained from all the distributional (empirical)
results and the findings from the empirical analyses (Chapter 8) and
exploratory analyses, ctrees and cforests (Chapter 9)” (p. 228).
Previous sum and current treatment coding practices are explained (p.
229). The interpretation of a sample model is unpacked step by step
(pp. 234–248). In essence, “you must balance the goal of finding the
best fit of the quantitative model with the qualitative/interpretative
goal of finding the best explanation” (p. 248).
Chapter 11 on visualisation describes how to “make the numbers
interpretable” by means of various ways of visualising the data (p.
250). It begins by illustrating how to use bar charts (pp. 251–257)
and line plots (pp. 258–259), e.g. in order to visualise
cross-tabulations. It moves on to visualising ctrees (pp. 266–268) and
random forests, including variable importance plots (pp. 269–274). It
shows how to produce cow plots, i.e. taking multiple plots to combine
them into a complex plot (pp. 274–276). The chapter illustrates
various ways to visualise Glmers (generalised linear mixed effects
models), which are often “difficult to understand” (pp. 276–289).
Finally, the chapter shows how to use ribbon plots in order “to
visualise change in progress” (pp. 289–298) and box plots “to explore
dispersion and diffusion” (pp. 298–300).
Chapter 12 on interpreting and reporting results begins with the three
lines of evidence, i.e. statistical significance, relative strength,
and constraint ranking with regard to categories within a factor group
(pp. 302–318). With regard to the latter, “the hierarchy of
constraints constituting each factor is taken to represent the
variable grammar” (pp. 308–309). Thus also, “if two varieties do not
share the same constraint hierarchies, then universals have
effectively been ruled out” (p. 311). The chapter then highlights the
importance of “comparative sociolinguistics”, that is “a consistent
comparison of these lines of evidence, but with the addition of two or
more relevant bodies of material to compare and/or contrast” (p. 318).
The chapter finishes with some common scenarios, including “tracing
the history and origins of varieties”, considering similarities and
differences across data sets, the impact of individuals and
idiosyncrasy (pp. 319–322), and advice on how to report results,
including the issues of replicability and comprehensibility to the
audience (pp. 322–325).
Chapter 13 on finding the story is about “how it all fits together”
(p. 327). It is (as also the exercise at the end of the chapter
confirms) in essence a run-through of how to write up the research
that has been done, taking into account findings, tying up loose ends,
“offer[ing] a plausible explanation for the results”, and addressing
objections and the weakest bits (pp. 327–330). It emphasises the
importance of “situat[ing] and interpreting beyond the analysis
itself” (p. 331). The chapter quite pointedly describes a good
research paper as having “cogent argumentation, solid evidence, and a
chronologically ordered, unfolding narrative, which may even contain
conflict, climax, and resolution”, i.e., as telling a story well (p.
332). The chapter finishes with some remarks on oral presentations and
how they differ from the written version of the research (pp.
332–334).
EVALUATION
The book is the updated second edition to Tagliamonte’s (2006) first
edition. By contrast to the 2006 edition, especially Chapters 8 to 12
are based on statistics with R rather than the variable rule program
(GOLDVARB). The book describes variationist sociolinguistics at the
start as “practical, replicable, and contextualised” (p. 13). All
three aspects noticeably run through the way the book presents
content.
To begin with the practical aspect, or the grounding in the real
world, the book places significant weight on balancing budget and
research (p. 26), on tweaking processes to situational settings (pp.
31 and 39), and on the hundreds of files that are often created in
studies of this kind (p. 191), to name only a few examples. Especially
in an introductory textbook, this groundedness in the real world is
refreshing and fundamentally important so as to make the methodology
in particular tangible and relatable. From a practical perspective, it
is worth pointing out that Chapters 8 to 12 in particular are written
as a manual for using R in sociolinguistic analysis (rather than SPSS,
Python, etc.). The datasets that the reader is provided with are used
throughout the book in order to guide the reader through exercises and
tasks. I believe that these could be manipulated equally well with
SPSS or Python; yet the reader would need to map the task carried out
in R onto these differing coding environments via their documentation.
To move on to replicability, parts are hugely beneficial, especially
the mapping of statistical concepts onto sociolinguistic realities or
the mapping of recording standards onto sociolinguistic concepts.
Here are three examples: (Ch. 7 p. 138) “The factor groups are columns
in Excel; the factors are the categories in each column.”; (Ch. 9 p.
196) “The dependent variable is the variable you are trying to
explain.”; (Ch. 12 p. 320) “Individuals or words (e.g. nouns, verbs,
or adjectives) are typical examples of random effect factors.”
Furthermore, methodological caveats like the ones about zero tokens as
a “pitfall of orthographic transcription practices” (p. 62) and the
correlation between the number of tokens and the tendency “to detect
more factors to be statistically significant” (p. 306) are brilliant
for learners. In some cases, such mappings and caveats would help
replicability possibly more than the extensive coding sequences, which
can become outdated relatively quickly when there are updates to the R
environment; such updates are usually reflected in the R documentation
immediately but of course would not necessarily be flagged to the
reader of this book. I wonder whether encouraging the reader even more
to use the R documentation alongside the book throughout might mediate
this issue.
To finish with contextualisation, specifically the impact of the
analyst’s decision-making in the data collection, analysis, and
interpretation is highlighted throughout and the pitfalls of
underestimating this impact are variously shown, e.g. “even the most
sophisticated quantitative manipulations will not be able to save the
analysis if you do not do this (sc. make principled decisions) first”
(p. 79), “the results will mean very little if you did not have a
hypothesis motivating the test in the first place” (p. 101), “there is
also no assurance that statistical significance is linguistically
meaningful” (p. 110), “the process of finding the best analysis for
the data is multidimensional and not entirely statistical” (p. 125),
“the adage of ‘garbage in, garbage out’ applies” (p. 213), and “the
explanation is in the hands of the analyst to pursue” (p. 257).
Especially in conjunction with its focus on comparative approaches
(see Chapters 12 and 13), this emphasis on the importance of
decision-making gives a clear idea of the role the analyst plays, with
regard to data, tools, and the final story to tell. This is important
also as the analyst is perfectly described at the start as follows: “a
sociolinguist (sc. unlike a linguist) is more likely not to ask a
question at all [but] will just let you talk about whatever you want
to talk about and listen for all the ways you say X” (p. 5).
Sociolinguists are often-times observers, conscious of the complex
link between language and identity, and highly aware of the
situational settings and dynamics.
I do not catalogue typographic issues in detail but have noticed that
there were quite a lot in Chapter 1 in particular and that throughout
some typographic issues appeared. In some sentences, this gardenpathed
the reader initially and required a double take. In Chapter 9, figure
9.9 is actually in colour despite the disillusioned comment by the
author “were we to have colour in this book” (p. 207). The online
resources can be found on the address at the start of the summary, not
under the address stated in Chapter 1. None of these minor issues
takes away from the quality of the book in a significant way.
The book is, as intended, a manual of sociolinguistic variation
analysis including a wealth of real-world experience and advice (p.
334). This makes it possible to map the methodology shown onto very
different kinds of languages, such as corpus/text languages, without
living native speakers (Fleischman 2000; Bentein 2019). It also makes
it possible to deal with phenomena that often cause difficulties, such
as zero tokens (p. 62), as in embedded main-clause phenomena, or
multi-lexemic units, as in multi-word expressions (p. 54) (Djärv 2022;
Savary et al. 2018), also in corpus/text languages (Rosén 2011; Fendel
2025: esp. 270–273). Thus, while written for a beginner initially, the
book has many layers and something to offer to a wide range of
readers.
REFERENCES
Bentein, Klaas. 2019. Dimensions of social meaning in Post-classical
Greek: Towards an integrated approach. Journal of Greek Linguistics
19(2). 119–167.
Djärv, Kajsa. 2022. On the interpretation and distribution of embedded
main clause syntax: new perspectives on complex discourse moves.
Glossa: a journal of general linguistics 7(1).
Fendel, Victoria. 2025. Giving gifts and doing favours: Literary
classical Attic Greek support-verb constructions. Leiden; Boston:
Brill.
Fleischman, Suzanne. 2000. Methodologies and ideologies in historical
linguistics: On working with older languages. In Susan Herring, Pieter
Reenen & Lene Schøsler (eds.), Textual parameters in older languages,
33–58. Amsterdam: John Benjamins.
Rosén, Hannah. 2011. Zeroing in on Latin asyndesis. STUF - Language
Typology and Universals 64(2). 136–147.
Savary, Agata, Marie Candito, Verginica Mititelu, Eduard Bejček,
Fabienne Cap, Slavomír Čéplö, Silvio Cordeiro, et al. 2018. PARSEME
multilingual corpus of verbal multiword expressions. In Stella
Markantonatou, Carlos Ramisch, Agata Savary & Veronika Vincze (eds.),
Multiword expressions at length and in depth: Extended papers from the
MWE 2017 workshop, 87–147. Berlin: Language Science Press.
Tagliamonte, Sali. 2006. Analysing sociolinguistic variation. 1st ed.
Cambridge: Cambridge University Press.
ABOUT THE REVIEWER
Victoria B. Fendel (D.Phil. Oxford, 2018) is a lecturer at Lady
Margaret Hall, University of Oxford, one of the editors of the
Registered Reports in Linguistics, and language leader for Ancient
Greek in the PARSEME/UniDive COST initiative. Her research focusses on
bilingualism and language contact (Oxford University Press, 2022),
multi-word expressions (Brill, 2025) in literary, epigraphic, and
papyrological sources, and on the development of digital tools for
large corpora (Language Science Press, 2024).

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

European Language Resources Association (ELRA) http://www.elra.info

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com

SIL International Publications http://www.sil.org/resources/publications

----------------------------------------------------------
LINGUIST List: Vol-37-1219
----------------------------------------------------------