30.2667, Review: English; Indo-European; Computational Linguistics: Hoffmann, Sand, Arndt-Lappe, Dillmann (2018)

The LINGUIST List linguist at listserv.linguistlist.org
Mon Jul 8 14:55:34 UTC 2019

LINGUIST List: Vol-30-2667. Mon Jul 08 2019. ISSN: 1069 - 4875.

Subject: 30.2667, Review: English; Indo-European; Computational Linguistics: Hoffmann, Sand, Arndt-Lappe, Dillmann (2018)

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:

Editor for this issue: Jeremy Coburn <jecoburn at linguistlist.org>

Date: Mon, 08 Jul 2019 10:55:04
From: Elen Le Foll [elefoll at uos.de]
Subject: Corpora and Lexis

Discuss this message:

Book announced at http://linguistlist.org/issues/29/29-1788.html

EDITOR: Sebastian  Hoffmann
EDITOR: Andrea  Sand
EDITOR: Sabine  Arndt-Lappe
EDITOR: Lisa Marie  Dillmann
TITLE: Corpora and Lexis
SERIES TITLE: Language and Comupters
YEAR: 2018

REVIEWER: Elen Le Foll, Universität Osnabrück


The papers in this volume were first presented at the 36th ICAME conference
which took place in Trier in May 2015. The title of the publication echoes the
title of the conference: ''Words, Words, Words – Corpora and Lexis''. The
editors, Sebastian Hoffmann, Andrea Sand, Sabine Arndt-Lappe and Lisa Marie
Dillmann, provide a brief introduction highlighting the lexicographic and
pedagogical implications of the paradigmatic and syntagmatic approaches corpus
linguists usually take to researching lexis.

In the opening chapter, ''Modelling Lexical Structures in the Oxford English
Dictionary'', Edmund Weiner, the deputy chief editor of the OED, retraces the
development of the OED's structural information networks from the mid 1980s,
when the original computerisation of the OED was planned, to the present day.
He suggests approaches to developing a dictionary that truly illustrates the
non-linear nature of lexis and the great number of interconnections between

In the second chapter, Antoinette Renouf investigates the circumstances of
word coinage in a large diachronic corpus of UK newspaper writing, where
coinage is deemed ''to be a special case of neologism, distinct in that the
act of creation itself is [the] focus'' (p. 40). To this end, she ran an
automated corpus monitoring system to detect potential coinages and combined
these results as well as those from previous studies to draw up a framework
for a working classification of coinage types.

In ''Synonym Selection as a Strategy of Stress Class Avoidance'', Julia
Schlüter and Gabriele Knappe investigate the influence of rhythm and stress on
the selection of near-synonymous adjectives in English. Their results, drawn
from diachronic data of written and spoken American English spanning almost
two centuries, suggest that adjectives with equivalent meanings but different
rhythmic shapes do not occur equally frequently in all syntactic functions.

The following three chapters in this volume focus on the discourse functions
of specific words in English. Karin Aijmer explores ''Intensification with
Very, Really and So in Selected Varieties of English''. She points to
differences between the frequencies of these intensifiers across spoken
varieties of English from the UK, the US, New Zealand and Singapore and
investigates their common collocates, as well as their individual semantic and
functional profiles.

In the following chapter, John Kirk explores ''The Pragmatics of Well as a
Discourse Marker in Broadcast Discussions'' recorded in the Great Britain and
Ireland. The paper aims to apply the methodology followed in Aijmer's (2003)
study of pragmatic markers and apply it to a new corpus (SPICE-Ireland), as
well as to re-analyse the data upon which the model was first developed
(ICE-GB). The author concludes that ''there is nothing peculiarly Irish about
discourse uses of 'well''' (p. 163). Crucially, this paper highlights the
pitfalls associated with assigning well-defined pragmatic function categories
to multi-functional discourse markers in natural corpus data. 

Maïté Dupont's contribution to the volume is a ''A cross-register Study of
Connectors of Contrast'' in parliamentary debates, newspaper editorials and
academic writing. She applies the framework of systemic functional linguistics
with its notions of Theme and Rheme to investigate adverbial connector
placement, together with the ''the powerful methods and solid empirical basis
afforded by corpus linguistics'' (p. 177). The results appear to show that
there are lexically-primed connectors, whose placement patterns are stable
across registers, and stylistically-primed connectors, which are frequently
polyfunctional and whose position is very likely to be affected by register.
''Towards a Model of Co-collocation Analysis: Theory, Methodology and
Preliminary Results'' by Moisés Almela and Pascual Cantos addresses the issue
of inter-collocational dependency. They demonstrate that collocational
associations may not be contained in the relationship between the node and the
collocate, but rather between collocates themselves. Indeed, the authors argue
that whilst strength of association has been the focus of much attention in
corpus linguistics and is now well captured by many existing collocation
statistics (e.g. t-score, z-score, MI, logDice, etc.), the mode of association
– which they define as ''the configuration of relations between the internal
structure of the collocation and the domains of lexical attraction that can be
identified in a collocational window'' (p. 213) – has largely been ignored.
Almela and Cantos thus introduce a new category, the co-collocate, and present
a step-by-step methodology to extract these, illustrating the method and the
kind of results it can yield with the lexeme 'consequences'.

The final two chapters focus on pedagogical applications of lexicogrammar
research. Costas Gabrielatos explores ''The Lexicogrammar of BE Interested:
Description and Pedagogy'' by cross-referencing the information provided by
pedagogical materials (EFL grammars and dictionaries), with results drawn from
a corpus of spoken and written L1 English (BNC) and the patterns found in
English L2 learners' speech and writing (ICLE and LINDSEI). He reports
striking differences in frequencies and patterns of use between L1 and L2
usage and concludes that the results point to a correlation between L2 use of
'BE interested' and its treatment in the pedagogical materials examined. 

The volume closes with a paper by Yves Bestgen and Sylviane Granger
investigating EFL learners' phraseological acquisition processes. In
''Tracking L2 Writers' Phraseological Development Using Collgrams: Evidence
from a Longitudinal EFL Corpus'' they describe their rationale for using
collgrams as their unit of phraseological measure and the methodology used to
extract these. The authors compare the learner data collgrams to L1 data from
the BNC, thereby revealing different patterns of progress across the learning
process, and depending on the types of bigrams. They also compare their
results, drawn from a longitudinal corpus, to those from a comparable
pseudo-longitudinal design and report very similar trends.


The opening chapter provides valuable insights into a lexicographer's
practical considerations in attempting to realise some of the potential of the
lexico-grammatical structures uncovered by decades of corpus linguistics
research. Weiner soberly lists the aspirations formulated in the mid-1980s
that are yet to realised and does not shy away from proposing fundamental
changes to the structure of the dictionary in order to develop the OED into a
fully explorable digital archive. At the same time, he concludes his chapter
with more realism than we are perhaps used to in academic writing – making it
clear that the changes he suggests will need to come from the publishers
themselves, since it is not a case of the market driving changes. The chapter
also includes a number of full-colour exemplifications of the approaches the
author suggests with example entries.

Renouf's chapter on word coinage is fascinating both in terms of its
methodology and its results. Though some limits of the study are acknowledged
in the closing remarks, it is somewhat surprising to see that register
restriction is not mentioned as a possible limitation. Whilst the typology of
coinage signalling the author arrives at will no doubt be highly valuable for
future research on neologisms, the nature of the corpus used to derive it will
inevitably bias it. Since the corpus queried contained texts from the Guardian
and the Independent, it may be more accurate to conclude that this paper
presents a typology of coinage signalling in UK newspaper writing. The paper
also presents many examples for each type of coinage identified, many of which
are quite entertaining and thus make the chapter a very pleasurable read.

Schülter and Knappe's paper on the effect of stress on synonym selection packs
a great deal of detail with a number of highly informative graphs in a single
chapter. In spite of the (acknowledged) different degrees of statistical
significance, the results are compelling and very well-explained. It must be
stressed [no pun intended!], however, that the chapter essentially presents a
detailed analysis of four case studies: the synonym pairs 'rich–wealthy',
'glad–happy', 'shut–closed' and the triplet 'fast–quick–rapid. As a result,
further studies involving other adjectives are necessary to rule out a main
effect factor simply involving the idiosyncratic properties of the lexemes

In the introduction to her chapter on the three intensifiers 'very', 'really'
and 'so', Aijmer states that ''the research questions focused on in this study
are both quantitative and qualitative'' (p. 107). However, the many tables
reporting quantitative results do not make any mention of statistical testing,
which makes it rather hard for the reader to draw any conclusions from these
tables. Qualitatively, the author helpfully provides bullet point summaries of
the most important conclusions from the analyses on each intensifier. Arguably
the most innovative aspect of this study is its attempt to capture the
different uses of intensifiers in several English varieties as
exemplifications of Schneider's (2007) developmental stages in his model of
postcolonial Englishes. Drawn on the basis of just three words, these
parallels can only be very tentative for now, but this study certainly opens
up interesting avenues for further research.

As in the preceding paper, Kirk's contribution to the volume provides detailed
results in table form with both raw and relative frequencies, but eschews
statistical test results. Nevertheless, this paper makes a perhaps unique
contribution to advancing corpus linguistics as a discipline, since it is one
of very few studies to attempt to verify (part of) a previous corpus-based
study (Aijmer's analysis of 'well' [2013]) on the very same corpus (ICE-GB).
Not only does Kirk find 127 instances of pragmaticalised 'well' whilst Aijmer
finds 130, but more critically, the two authors arrive at radically different
functional distributions of this same particle. Hence, whilst
computer-generated frequency results may conveniently satisfy corpus
linguists' endeavour for objectivity, Kirk shows that functional
interpretations are rather less objective than we are often tempted to
believe. He thus invites us all to rethink our analysis procedures, if we are
to ensure to uphold Leech's (1992) three core principles of corpus
linguistics: verification, replication and objectivity.

Dupont's paper further develops the Systemic Functional Linguistics framework
by adding further categories within the Rheme and the results appear to show
that these new distinctions are indeed constructive. The categories of
placement of connectors are well explained and illustrated with plenty of
salient examples from the corpora examined. However, the tabular results (e.g.
Table 6.5-6.13) would be much easier to read if they were reported
graphically. Since no shading is used to illustrate significant differences
between percentages, it is rather difficult for the reader to discern either
of the ''two main types of placement profiles'' which the author claims
''emerge from these tables'' (p. 199). 

Almela and Cantos' chapter makes a compelling case for the introduction of
co-collocates in corpus-based lexical research. Their methodology is
well-explained and illustrated. It can be speculated that it may provide
particularly valuable insights for lexicographic and pedagogical applications.
The main caveat is acknowledged in the paper itself and concerns the size of
the corpus required for such calculations. The method currently requires the
use of mega-corpora (such as the enTenTen2013 queried for the study) which
likely means that the method is sadly not currently applicable to specialised
corpora or even general language corpora for languages other than English with
considerably fewer online text contents available.

Gabrielatos' paper addresses at least two issues that go over and beyond ''The
Lexicogrammar of BE Interested: Description and Pedagogy''. First, he presents
a compelling framework for pedagogy-driven research which involves comparing
the use of lexicogrammar in L1 and L2 corpora, as well as pedagogical
materials. However, the reader may be surprised to discover that the
pedagogical materials selected for this particular study are in fact reference
works (e.g. English Grammar in Use, Collins COBUILD English Grammar and
Cambridge Dictionary Online) which learners may (or may not) consult as part
of their learning process. Gabrielatos argues that whilst learners may not
actually consult these specific sources, ''they can be expected to be largely
representative of the kind of input L2 learners receive'' (p. 249). Still, one
wonders whether EFL textbooks may have been a more appropriate choice to
capture the language use that learners are frequently exposed to in
instructional settings. Second, with this thorough case study on 'BE
interested', the author lends support to Halliday's conception of lexis and
grammar as ''complementary perspectives'' (Halliday, 1991, p. 32) marking
''the notional ends of a lexicogrammatical continuum'' (p. 244). 

The brief summary of the results of a study by Ädel and Erman (2012) in the
literature review section of the final chapter is somewhat unclear. If the
reader is not already familiar with the study, it is impossible to infer that
the figures reported (130 lexical bundles in native texts and 60 in L2 texts)
refer to the number of bundles uniquely found in only one of the two corpora
studied. Nevertheless, Besten and Granger's contribution convincingly
demonstrates the usefulness of collgrams in pedagogical applications and the
study's methodology may well prove influential for future study designs. The
results themselves are difficult to evaluate at group level. The authors
acknowledge a number of limitations – the most important ones being that the
longitudinal corpus used only has two measurement points (first and third year
of study) and the difficulty of accounting for intrapersonal factors when
reporting such group trends (this point is well illustrated in Fig. 9.2). The
comparison of the results from this longitudinal study and a
pseudo-longitudinal one is also a welcomed contribution to the field,
especially since we are all acutely aware of the complex and oftentimes costly
processes that longitudinal data collection usually entails. 

In conclusion, corpus linguists can look forward to reading this fine
selection of a top quality papers first presented at the 36th ICAME conference
in Trier. Indeed, the volume provides more than the results of a few
fascinating individual case studies using a range of corpus resources and
state-of-the-art tools: it also explores methodological issues and proposes
new procedures and measures. Moreover, ''Corpora and Lexis'' also contributes
to the refinement and development of (new) theoretical concepts and features
novel applications of corpus-based findings in lexicographic and pedagogical


Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing
by native and non-native speakers of English: A lexical bundles approach.
English for Specific Purposes, 31(2), 81–92. 

Aijmer, K. (2013). Understanding pragmatic markers. Edinburgh University

Halliday, M. (1991). Corpus Studies and Probabilistic Grammar. In K. Aijmer &
B. Altenberg (Eds.), English Corpus Linguistics: Studies in Honour of
Jansvartvik (pp. 30-40). London: Longman.

Schneider, E. W. (2007). Postcolonial English: Varieties around the world.
Cambridge University Press.


Elen Le Foll is an English Education lecturer and PhD candidate at Osnabrück
University. Her research interests include learner phraseology, language
learners' use of online resources, textbook English and teacher training. She
also teaches conference interpreting (German-English) at the University of
Applied Sciences in Cologne and works as a freelance conference interpreter.


***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:

LINGUIST List: Vol-30-2667	

More information about the LINGUIST mailing list