36.1796, Reviews: Contrastive Corpus Linguistics: Taieb (2025)

Tue Jun 10 00:05:02 UTC 2025

LINGUIST List: Vol-36-1796. Tue Jun 10 2025. ISSN: 1069 - 4875.

Subject: 36.1796, Reviews: Contrastive Corpus Linguistics: Taieb (2025)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Joel Jenkins <joel at linguistlist.org>

================================================================

Date: 09-Jun-2025
From: Almontassar Bellah Taieb [almontassar.taieb at gmail.com]
Subject: Computational Linguistics, Discourse Analysis, General Linguistics: Taieb (2025)

Book announced at https://linguistlist.org/issues/35-2700

Title: Contrastive Corpus Linguistics
Subtitle: Patterns in Lexicogrammar and Discourse
Publication Year: 2024

Publisher: Bloomsbury Publishing
           http://www.bloomsbury.com/uk/
Book URL:
https://www.bloomsbury.com/contrastive-corpus-linguistics-9781350385931/

Editor(s): Anna Cermakova, Hilde Hasselgård, Markéta Malá, Denisa
Šebestová

Reviewer: Almontassar Bellah Taieb

SUMMARY
The technological boom and the proliferation of large multilingual
corpora have greatly expanded the breadth and depth of current
linguistic inquiry. These developments have, in turn, made it possible
to investigate language use across typologically and culturally
diverse languages. Contrastive Corpus Linguistics: Patterns in
Lexicogrammar and Discourse capitalises on this methodological
momentum by bringing together cutting-edge work that spans
lexicogrammatical, pragmatic, and discourse-analytical dimensions of
cross-linguistic variation. Its ability to situate findings in the
broader theoretical discussions of contrastive research sets this
volume apart. This work contains eleven chapters organized into two
sections: Lexicogrammar in Contrast (I) and Discourse in Contrast
(II). The organisation undergirds the assumption that current
contrastive analyses need to address both the structural
(lexicogrammatical) components of language and the higher-level
discourse and pragmatic phenomena that govern language use in context.
In combining a comprehensive scope with analytical precision, this
exciting collection broadens its appeal to a wider readership and
provides promising research avenues for future contrastive research.
In this review, I examine the thematic components of each main part
before offering an appraisal of the strengths and weaknesses of the
volume, thereby offering readers a balanced perspective on its
contributions to the field.
In their introduction, the editors—Hilde Hasselgård, Anna Cermakova,
Markéta Malá, and Denisa Šebestová—celebrate thirty years of
contrastive corpus linguistics and react to some parallel changes
coinciding with the development of this line of research. Against this
backdrop, they recognise the critical influence of Karin Aijmer and
Bengt Altenberg in laying the groundwork for subsequent scholarship.
It is therefore not surprising that the editors had Aijmer inaugurate
this volume herself. In Chapter 1, Aijmer introduces what she calls
the “new contrastive corpus linguistics”—an era characterised by
leveraging hard corpus evidence to identify linguistic similarities
and differences across languages. She highlights the increasing
convergence between contrastive corpus linguistics and pragmatics in
the study of various pragmatic phenomena across languages. While this
trend may seem a necessary concomitant to recent developments in
corpus methods, she points out the relevance of new-fangled types of
parallel corpora in contrastive corpus pragmatics, thus expanding the
field’s scope. Although still nascent, multimodal corpora are becoming
increasingly important in contrastive research on pragmatics and genre
analysis. Aijmer collates her discussion with useful literature to
reflect on current trends while offering insights that the field can
capitalise on for further developments.
Part I. Lexicogrammar in Contrast
Part I of the volume discusses lexicogrammatical phenomena—the
building blocks of language that combine lexical items with
grammatical structures to create meaning. The chapters here analyse
cross-linguistic patterns in several text types and languages, with a
particular emphasis on how they are rendered differently across
linguistic systems. In Chapter 2, Signe Oksefjell Ebeling presents the
results of a contrastive analysis of the cognates see (English) and se
(Norwegian). The purpose of her study is twofold: (a) to analyse the
specific lexicogrammatical behaviour of these perception verbs in
several registers, such as fiction dialogue, fiction narrative, and
football match reports; (b) to evaluate, in a further step, the extent
to which the behaviour of such a cognate pair is language- and/or
register-dependent. The data reveal that the proportional distribution
of their semantic and syntactic categories fluctuates within and
across corpus materials. This exploration not only reinforces the idea
that form-meaning pairings are influenced by context but also
demonstrates the methodological rigour of using corpora to tease apart
subtle relationships.
In another chapter, Hilde Hasselgård undertakes a cross-linguistic
comparison of the English-Norwegian periphrastic genitives—expressed
via a postmodifying prepositional phrase, with English employing the
of-genitive and Norwegian using the til-genitive— in fictional and
non-fictional texts. She demonstrates that the periphrastic
of-genitive is far more prevalent in English than in Norwegian.
Furthermore, the study provides compelling evidence that the choice of
periphrastic genitives can be conditioned by certain possessive
relations (e.g., body, feature, and kinship) and register-specific
tendencies. The author discusses how the animacy of the possessor
varies across the languages under question: while both human and
animate possessors typically favour til-genitives, periphrastic
of-genitives are typically associated with inanimate possessors to
express a wide repository of meaning relations. Hasselgård attributes
the high occurrence of divergent translations of periphrastic
genitives to differences in the animacy of the possessor and the
nature of possessive-like relations.
Chapter 4 presents a third corpus-based contrastive study by Thomas
Egan. Similar to Ebeling's approach, the author examines four pairs of
ditransitive verbs (“send/sende’’, “bring/bringe’’, “lend/låne’’, and
“sell/selge’’) in the English-Norwegian Parallel Corpora (ENPC). These
verb pairs are found to encode acts of physical transfer while
permitting double object constructions—ditransitive and prepositional
dative. Egan reveals a close resemblance in the use of ditransitive
and prepositional dative constructions across the ENPC. More striking
differences, however, emerge when considering the degree of congruence
in the direction of the translations. Of the four pairs of
ditransitive verbs, the percentage of the “sell’’ verbs exhibits a
near-total correspondence in both directions, whereas the remaining
cognate verbs occur with significant variations. To provide further
insights, Egan expounds on two key factors: (a) the shared syntactic
environment in both languages, which, in this case, increases
translation convergence; and (b) the constraints inherent in the
semantic field of the receiving language. This research provides
valuable insights into cross-linguistic patterns of ditransitive verb
constructions while also highlighting key challenges inherent in their
translation.
Chapter 5 shifts attention to the formulaic nature of newspaper
reports. In their study, Denisa Šebestová and Markéta Malá investigate
frequent prepositional patterns in English and Czech, particularly
those embedded in recurrent word combinations (also known as n-grams)
of varying lengths. The authors compile a lemmatised list of 3–5-grams
featuring “in/v’’ patterns in both languages and collate their
findings along two major axes: (a) mapping out the prepositional
n-grams into phrase-based and clause-based; (b) examining their
text-organising functions in tandem with their recurrent patterns. The
authors illustrate that newspaper reports have a rich repository of
n-grams whose function is not only to mark spatio-temporal meanings
but also to convey communication patterns, express event-related
relationships, and reflect varying degrees of idiomaticity. A sizeable
portion of the chapter, however, explores the textual functions of
multi-word prepositions, revealing their semantic preferences and
evaluative prosodies per language. Overall, the chapter demonstrates
that even seemingly closed-class words, like prepositions, may in fact
be well suited for n-gram analysis while acknowledging the challenge
of comparing the phraseology of typologically distinct languages using
the n-gram method.
Although it may seem distinct in its treatment, Chapter 6 builds on a
previous discussion in this volume by emphasising the role of corpus
linguistics in advancing the scope of phraseology. The authors—Jiajin
Xu, Guying Zhou, Xinlu Liu, Yuanyuan Wei, Ruchen Yu, and Suhua Zhang—
undertake a large-scale comparison of five typologically distinct
languages: Arabic, Chinese, English, Malay, and Swahili. Adopting a
bottom-up corpus-driven approach, they identify p-frames (i.e.,
non-contiguous multi-word sequences with a variable) of the most
frequently occurring 3- to 5-word sequences in news texts. These
sequences are subjected to a systematic comparison in terms of
variability, predictability, and discourse function across the five
languages under question. The analysis reveals that Arabic and Swahili
exhibit an inverse relationship, with statistically differing levels
of variability and predictability in p-frames compared to the other
languages. Moreover, the functional distribution of p-frames indicates
that referential expressions dominate across all five corpora,
followed by stance markers and discourse-structuring expressions. The
authors, however, note that the use of stance markers appears to be a
central dividing line, with an overwhelming preponderance in English.
Overall, this chapter raises an interesting discussion on the
application of the p-frame approach to characterise genre-specific
phraseology.
The concluding chapter in Part I, co-authored by Camino
Gutiérrez-Lanza and Rosa Rabadán, presents a cross-linguistic analysis
of dubbing as a relatively occluded genre in contrastive research. The
authors underscore the inherent challenges of audio-visual
customisation, which go beyond creating discourse to also require
linguistically and culturally appropriate adaptations of the target
language. Notably, the tension between isochrony, lip-syncing, and
prefabricated orality led to the rise of what the authors term
“dubbing-lect features’’ in the English-Spanish audiovisual industry.
Using data from a novel type of parallel corpus, the study
investigates key features of English modals can/could and subject
pronoun rendering in Spanish dubbing. The authors extract “can/could’’
translated as “poder’’ and aggregate them in accordance with their
respective functions. Cases where “can/could’’ modals are rendered
using alternatives other than the redundant “poder’’ elucidate the
range of semiotic resources available in non-translated Spanish (e.g.,
“saber’’). On the other hand, the discussion addresses some
problematic transfer of certain dubbing features, which may
inadvertently create meanings and patterns different from those in
non-translated Spanish. Towards the end, the authors endorse the use
of some subject pronouns for their adjusting role—even though at times
unwarranted—the result of which is evident in the distinct functions
they serve.
Part II. Discourse in Contrast
Following the exploration of lexicogrammar, Part II turns to Discourse
in Contrast. It covers a wide range of topics, from expressing
politeness (English and Norwegian) and coherence relations (English
and French) to speech reporting (English, Czech, and Finnish) and
punctuation stylistics (English, Swedish, and German).
In Chapter 8, readers revisit the comparison between English and
Norwegian, albeit this time from a social-functional perspective.
Stine Hulleberg Johansen and Kristin Rygg analyse the English request
marker “please” in comparison with its Norwegian counterparts in the
ENPC, identifying three primary functions: as a ritual frame
indicating expression, a politeness marker softening requests, or a
request marker strengthening the directive force. The frequency
analysis shows that their distribution varies depending on the
interaction types (i.e., interpersonal versus communal) and the
situation types (i.e., standard versus non-standard). The authors go
further to reveal that Norwegian, indeed, possesses a rich repository
of about twelve request markers (e.g., “er du/de snill’’,
“vennligst’’, and “vær så god’’) that show different patterns of
frequency across situation types. Notably, “please’’ can appear in
various positions within a sentence and often corresponds to “vær så
snill’’ or is simply omitted in Norwegian translations. Furthermore,
specific to Norwegian, the translation equivalent “er du snil’’ is
unique, typically appearing in a unit-final position but equally
importantly carrying a stronger illocutionary force than “please” in
the same position. This finding illustrates that even a single lexical
item can take on multiple sociopragmatic functions that may not be
directly transferable across linguistically and culturally distinct
systems. Overall, the chapter provides a nuanced account of how
“please” functions within English and how its Norwegian equivalents
are not entirely isomorphic in their functions, enriching our current
understanding of cross-linguistic politeness.
Chapter 9 takes a step further into the study of coherence marking,
focusing on the analysis of a spoken genre across several languages.
This contribution can be said to complement earlier analyses from
Chapters 2 and 4 in that formal similarity does not necessarily equate
to pragmatic or stylistic equivalence (see also Chapter 10). In
extending the focus from core lexical and syntactic resemblances to
the discourse-pragmatic level, this shift underscores that even
cognates or functionally similar forms can diverge significantly
across languages based on their distribution and rhetorical roles.
With this in mind, Diana Lewis scrutinises the use of connectives in a
comparable corpus of French and English journalistic interviews.
Connectives, as useful anchoring tools for maintaining coherence in
text, are examined based on whether certain relation types are marked
more often than others. Central to Lewis’s hypothesis is that the
perceived compatibility between ideas lies on a cline, which
influences the frequency and distribution of coherence marking. The
classification procedure shows three major functional types:
causative, contrastive, and additive. Initial findings on the
type-token distribution indicate that French is slightly more
connective-heavy than English. Importantly, while the continuous
relations category is generally assumed to require less explicit
marking, it occurred significantly more frequently in the French
dataset—indicating a more ‘aesthetic preference for formal variation’
in political interviews. The comparison of the English connective
“then” and its French counterparts “alors” and “puis” elucidates a
peculiar semantic shift: connectives expressing temporal meanings may
undergo a differential process of grammaticalization that renders them
resultative- and additive-like. The remainder of the chapter sheds
light on their temporal- and resultative-shared meanings and dissects
their divergent ‘weaker’ senses. The author concludes with an analysis
of the functional distribution to accentuate how their
discourse-organisational uses are shaped according to
language-specific preferences and genre conventions. Prospective
readers are encouraged to read the full text for an in-depth
exploration of these finer points.
Chapter 10 follows suit with Ebeling’s analysis in this volume, yet
also offers a unique perspective through its dual focus on
characterising a subset of reporting verbs in prose and exploring the
effects of translation when rendered into different languages. The
authors of Chapter 10, Anna Cermakova and Lenka Fárová, begin by
examining the lexicogrammatical patterning of the English reporting
verb “said’’ in its past tense form, followed by an analysis of its
translated equivalents in Czech and Finnish. The study distinguishes
between two types of occurrences in which said appears either modified
by a specific class of non-finite clauses or remains unmodified. In
English, the sheer frequency of “said” assumes a reporting function
and exhibits notable idiosyncrasies in patterning. Meanwhile, the
results purport to demonstrate that, irrespective of the language,
this reporting device often occurs in modification patterns encoding
meaning beyond its neutral semantic sense. The second part of their
discussion provides a detailed account of the patterns
“with/without-PP’’ in Czech and Finnish to underline key contrasts in
the translation options available in the target language. While Czech
translators prefer to avoid near-synonyms (“řekl/řekla’’) for the
English verb “said” and opt for lexical blends to foreground the
meaning of the PP, Finnish translators have a greater inclination
towards the pragmatically neutral verb “sanoi”. Furthermore, the
authors investigate the ways in which the “with/without-PP” patterns
are mapped in the target texts. Overall, the chapter offers a nuanced
understanding of how language and authors’ stylistic preferences are
likely to influence the selection and interpretation of reporting
verbs.
The concluding chapter is co-authored by Jenny Ström Herold and Magnus
Levin, who initiate an interesting discussion on the use of the dash
as a meaning-bearing device in nonfiction across English, German, and
Swedish. Aligning with the volume’s overall pursuit in mapping
cross-linguistic variation, Ström Herold and Levin seek to determine
the function, form and positioning of dash-introduced segments in both
original and translated texts. The other aim is to verify whether
translators deploy language-specific strategy conventions for
rendering dashes into the target language, or whether the influence of
the source language persists after translation. The preliminary
analysis reveals that the original texts differ in their level of dash
use, with German exhibiting the highest frequency of dashes, followed
by Swedish and English. They attribute Germans’ predilection for
dashes to the grammatical role they serve in marking subordinate
clauses. Importantly, the classification of the dash-introduced
segments underscores their multifunctionality and highlights some
general trends. Regarding their forms and positioning, a more complex
picture emerges with English favouring sentence-medial and
sentence-final positions vis-à-vis German and Swedish tendencies for
sentence-final positions. The authors isolate three types of
strategies in dash-translated texts (retention, omission, and
insertion) and expound on the general principle of balancing the
preservation of the source text’s typical style with adaptation to
target-language punctuation norms. Evidence from this work underlines
the ongoing exploration into the ways punctuation markers and certain
stylistic features vary across languages.
EVALUATION
In its entirety, the topics, languages, and approaches covered in this
volume run the gamut. The individual contributors have utilised
state-of-the-art corpus techniques to empirically substantiate their
analyses. One of the most commendable strengths of Contrastive Corpus
Linguistics: Patterns in Lexicogrammar and Discourse is the due
emphasis on establishing a tertium comparationis— that is, a common
basis for aligning observations across languages. Taking this into
account, this approach enables researchers to systematically compare
linguistic patterns or phenomena across languages, without which it
would be impossible to ensure that the items or structures under
question are truly comparable.
Another strength lies in the book’s comprehensive scope. As shown in
the previous sections, the juxtaposition of micro-level grammatical
analysis with macro-level analysis of meaning and use adds to
establishing the intricate connection between structure and function.
This dual focus is particularly important in a field increasingly
characterised by embracing an integrated approach. The treatment of
several typologically distinct systems deserves close attention. By
including a wide array of languages from closely related Indo-European
pairs (English and German) to more distant systems (Arabic and
Swahili), this collection offers an insightful description of the
complexity of cross-linguistic patterns and structural variation. The
breadth of sampling not only expands the research foci of current
contrastive studies but also prompts researchers to examine how
language-specific factors interact and shape our methodological
practices.
It cannot be emphasised enough, however, that current work in
contrastive corpus linguistics should take a moment to reflect on the
field’s rapid growth and the diversity of its approaches. In keeping
with this perspective, the volume opens up a space to pay homage to
the historical foundations of the field while acknowledging the
contributions of key figures, like Karin Aijmer and Bengt Altenberg.
These important episodes help steer readers towards the points where
the editors deftly navigate emerging trends, including multimodal
corpora and genre-based analyses. The picture this volume paints is
unique, blending tradition and innovation as well as ensuring that it
is both a retrospective account and a blueprint for future research.
Despite its focus and rigour, there are a number of limitations that
need to be addressed. One potential weakness lies in the uneven depth
of analysis across chapters. While some studies provide a highly
detailed account of their subject matter, others—particularly those
spanning multiple languages or large corpora—may at times sound more
descriptive than analytical. This variability warrants attention as it
may pose a challenge for readers seeking a uniform treatment of all
the topics covered.
Another limitation relates to the accessibility of some of the
methodological discussions. Given the advanced statistical techniques
and specialised corpus tools employed in several chapters, readers who
are not well-versed in corpus linguistics may find certain sections
difficult to follow. At times, numerical data embedded without
explicit tabulation were nearly impossible to verify, thereby placing
a greater cognitive load on the part of the reader. Some figures were
not reader-friendly due to their congested layout and tight line
spacing, and because they were reproduced in grayscale, even line-type
variations (dashed, dotted, solid) fell short in providing sufficient
contrast. This presents an additional hurdle since visual
presentations are expected to steer readers towards key trends. There
are also several typographical errors and inconsistencies. It would be
impractical to enumerate all of them, but a few notable examples are
worth highlighting: in the opening chapter, the editors inadvertently
mistake the chronology of the contrastive workshops, citing ICAME 43
instead of ICAME 33; in Chapter 8, a slight typographical error
appears where “face-treat” is used instead of the intended term
“face-threat”.
Perhaps the most significant weakness is that the volume does not
sufficiently chart profitable directions for future research. Although
the volume marks thirty years of sustained research, it does not
convincingly articulate what the upcoming steps might be. Nor does it
seem to engage with recent advances in computational techniques, such
as machine learning or large-scale automatic annotation. This
shortcoming could have been mitigated by including a final note
offering a forward-looking perspective and calling for more concerted
efforts to address current challenges in the field. On another note,
while cross-linguistic comparisons appear to have useful applications
in contrastive corpus research, they are not without limitations. The
issue of translation bias is mentioned in several chapters, but could
have benefited from a more in-depth discussion. A more critical
examination of how translation practices might influence the empirical
findings would have added an extra layer of nuance to the analyses
presented in this volume.
In sum, Contrastive Corpus Linguistics: Patterns in Lexicogrammar and
Discourse stands as a tribute to the power of corpus-based methods in
unravelling the complexities of language use. The hybrid nature of
scholarly discussions across the chapters makes the volume an
essential resource for would-be scholars interested in the
intersection of linguistics, translation studies, and discourse
analysis. By addressing the minutiae of lexicogrammatical
constructions and the broader organisation of discourse, the volume
offers a balanced and multifaceted perspective, both empirically
rigorous and theoretically insightful.
ABOUT THE REVIEWER
Almontassar Bellah Taieb is a PhD student at the Doctoral School of
Linguistics, Pázmány Péter Catholic University. He is particularly
interested in L2 vocabulary studies, language learning strategies,
academic discourse and phraseology. In addition to his research focus,
Almontassar is a university lecturer where he teaches courses in
English language and academic skills.

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Edinburgh University Press http://www.edinburghuniversitypress.com

Elsevier Ltd http://www.elsevier.com/linguistics

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Oxford University Press http://www.oup.com/us

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-36-1796
----------------------------------------------------------