35.1729, Review: Korpusgestützte Sprachanalyse: Beißwenger, Gredel, Lemnitzer, Schneider (eds.) (2023)

Mon Jun 10 21:05:10 UTC 2024

LINGUIST List: Vol-35-1729. Mon Jun 10 2024. ISSN: 1069 - 4875.

Subject: 35.1729, Review: Korpusgestützte Sprachanalyse: Beißwenger, Gredel, Lemnitzer, Schneider (eds.) (2023)

Moderator: Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>

LINGUIST List is hosted by Indiana University College of Arts and Sciences.
================================================================

Date: 10-Jun-2024
From: dominique DIAS [Dominique.Dias at univ-grenoble-alpes.fr]
Subject: General Linguistics: Beißwenger, Gredel, Lemnitzer, Schneider (eds.) (2023)

Book announced at https://linguistlist.org/issues/34.3201

EDITOR: Michael Beißwenger
EDITOR: Eva Gredel
EDITOR: Lothar Lemnitzer
EDITOR: Roman Schneider
TITLE: Korpusgestützte Sprachanalyse
SUBTITLE: Grundlagen, Anwendungen und Analysen
SERIES TITLE: Studien zur deutschen Sprache
PUBLISHER: Narr Francke Attempto Verlag GmbH + Co. KG
YEAR: 2023

REVIEWER: dominique DIAS

SUMMARY

The book entitled Korpusgestützte Sprachanalyse (corpus-based language
analysis) edited by Michael Beißwenger, Eva Gredel, Lothar Lemnitzer,
and Roman Schneider, aims to delve into the foundations and
applications of corpus-based language analysis. This book is dedicated
to Professor Angelika Storrer on the occasion of her 65th birthday,
and is therefore intended to reflect on her career and her main areas
of research. The editors of the volume assume that the possibilities
of data-supported knowledge acquisition and data-driven theory
building have changed the requirements and job profile of linguists.
The volume contains 25 contributions written in German (with the
exception of two in English) divided into 6 parts, each devoted to a
specific aspect of Corpus Linguistics.

The first section of the book offers theoretical considerations on the
link between theory and object of analysis, and how this link
conditions access to data. In his chapter “Linguistics – theories and
empirical approaches”, Ludger Hoffmann reminds us that the purpose of
science is to constantly acquire new and socially relevant,
reality-based knowledge. And this knowledge is passed on, i.e.,
socially inherited. In the case of linguistics, this knowledge
concerns human language and how its use forms systems. This
contribution describes various paradigms that have punctuated the
history of linguistics, from functional grammar to pragmatics, via
generative approaches. The contribution is based on the idea that
language can be seen as an object in quite different ways and that
this requires different types of empiricism and methods of data
analysis.

The second chapter “Infinity Corpus – Linguistic megalomania played
out for once” is a reflection on the possibility of exploring large
amounts of data, an almost infinite corpus. Ulrich Schmitz recalls
that from the 1960s onwards, computers opened up the possibility of no
longer using only the always-limited and subjective intuition and
expertise of individual researchers as an empirical source. Even
though he appreciates the possibility of searching for regularities,
patterns in language use with large data, he considers the idea of an
infinite corpus to be unrealistic, unworldly, superfluous, inefficient
and stupid. Following the author, Corpus Linguistics must first raise
questions in order to gain knowledge. The appropriate type and
meaningful scope of the corpus depend on the question asked.

The second part of the book is devoted to the collection and
processing of language corpora. In the opening chapter of the section
entitled “DeReKo in the context of German-language contemporary
corpora”, Marc Kupietz, Harald Lüngen and Andreas Witt present the
DeReKo (short for Deutsches Referenzkorpus, German Reference Corpus)
and the possible strategies for its extension and integration in
research infrastructures. The DeReKo, which is based at the Leibniz
Institute for the German Language (IDS), involves building a vast,
automatically annotated German text corpus along with corresponding
tools for annotation and exploration. Important aspects of the
development of a future German-language reference corpus would be for
instance a platform where citizens can donate text data, or components
for the identification of translated text and machine-authored text.

In the contribution entitled “What do today’s corpora offer?”, Henning
Lobin explains how the availability of digital text corpora has
reinforced the shift from system-oriented to usage-oriented language
research. And each text type brings its own challenges in terms of
corpus construction: social media data for instance, which is
characterized by spontaneous language, provides access to a previously
less accessible and neglected type of language. According to the
author, linguistic corpora should include less readily available text
types in order to offer the possibility of drawing a realistic picture
of language use in a community.

In the next chapter, Aleksandra Pushikan and Erhard Hinrichs present a
learner corpus, “The IVK-LER corpus of adolescent foreign-language
learners of German”, which is also the title of the chapter. The
authors explain how the corpus has been built: 117 student-written
texts collected between 2020 and 2021, made of weekly writings
produced by a group of 18 adolescents in a preparatory class. The
presentation points out that corpora dealing with other languages than
English are rare and it gives an overview of Learner corpora for
German as a foreign and second language. The annotation of the corpus
aims to detect syntactic and grammatical difficulties for L2 Learners
but also to guide textbook authors and teachers in creating
appropriate teaching resources.

In the last chapter of the section, “Naturalness vs. richness vs.
comparability”, Uta Quasthoff raises the question of the compatibility
of the different constraints to be followed in the constitution of
corpora. She argues that the naturalness of data does not always
correspond to research questions that go beyond the descriptive
reconstruction of everyday interaction. The author outlines the basic
orientation and questions of interactional discourse analysis as a
theoretical framework and then describes a concrete example based on
previous studies she has carried out on acquisition processes by
primary school children.

The next section, entitled, “Text corpora: Studies and applications”
brings together contributions relating to different corpora of
standard written language. In the first chapter “Adaptability and
accentuation”, Ludwig M. Eichinger analyzes complex adjectives with
the first element {gender-}, using data from the DeReKo corpora. He
observes on many levels the paths of integration of this element,
which entered the German language not too long ago. In particular, the
analysis of the most frequently used word gendergerecht (gender-fair)
demonstrates how such a formation develops its usage and how patterns
thereby tend to change.

With the next chapter “Argument structures in expressionist lyric
poetry”, Stefan Engelberg shows how Corpus Linguistics can help in
rejecting the theory of deviation, according to which the language
used in poems is significantly different from non-lyrical language.
His work is dedicated to the conspicuousness of valency in 320
German-language poems of Expressionism. The author argues that
argument-structural innovation in poems should be analyzed within
pattern and construction-based approaches, which the deviation theory
calls into question.

In the chapter “Knowledge spaces of journals in articles, issues, and
series of issues. Text organization, multimodality, word usage”,
Thomas Gloning examines the connection between linguistic-textual
organization and knowledge transmission in journals. On the one hand,
he tries to show how aspects of text organization and
linguistic-multimodal design are related to the goals of knowledge
dissemination. On the other hand, he explores the way in which the
thematic-functional profile of a journal can be reconstructed in the
serial sequence of its issues.

Lothar Lemnitzer takes stock in the contribution entitled “20 Years of
Word Watch” of the project Wortwarte (Word Watch), that he initiated
in the year 2000 with his colleague Tylman Ule and that ended 2020.
The project aimed to collect and document new words in German,
focusing on neologisms found in German newspaper texts. The Wortwarte
was a pioneering achievement in the sense that the project recorded
new words and provided data for lexicologists and lexicographers
studying language change.

It is another platform that Frank Michaelis, Carolin Müller-Spitzer,
Jan-Oliver Rüdiger and Sascha Wolfer present in their chapter “Filter,
explore, compare: new corpus tools and instructive potentials of
OWIDPLUS”. OWID is a portal of the German IDS (Institute for the
German Language) in Mannheim, which provides dictionaries with
different thematic focuses, based on extensive empirical data.
OWIDPLUS is an extension of OWID, offering additional content and
features to explore data. The authors emphasize that the multitude of
potential research questions constantly leads to a heterogenous
landscape of resources and the need of new tools.

In the chapter, “Inductive or intuitive? The extraction of frames from
mathematical proof-texts”, Bernhard Schröder examines a concrete
example, the case of mathematical proof-texts. He explains that
referential ambiguities are minimized in the language of mathematics
through the use of mathematical notation, but they do exist. However,
mathematical proof-texts show structuring strategies that can be
modeled with the help of frames. The author insists on the fact that
the corpus-driven approach must be completed by a corpus-based
approach.

In the next chapter “Pure climate madness! On the conception of a
discourse glossary of climate compounds”, Manfred Stede, Anna-Janina
Goecke, Noël Simmel and Birgit Schneider analyze the conception and
implementation of an online glossary for “climate compounds” in
German. They are not interested in technical terms, but in terms that
can be used in politically motivated discourse and in a subjective way
(choosing to speak about climate crisis instead of climate change, for
instance). That is the reason why the authors use a corpus of texts
produced by climate skeptics and by climate activists.

In the contribution “Corpus findings and grammar using the example of
the genitive in German”, Gisela Zifonun wonders whether corpus-based
research changes the view of the language system or whether it only
provides clarifications and corrections in detail. In order to
enlighten the debate, she discusses two corpus grammatical studies on
the genitive case in German, showing that a grammar does not stand
above the language, but is an interpretative construct for the use of
the language.

The third section, “Corpus-based analysis of spoken language”, brings
together two corpus studies on spoken language. The first chapter of
the section, “On the use of metadata in interaction-analytical work
with corpora – The example of an investigation using the FOLK-corpus”
by Arnulf Deppermann and Silke Reineke points out the importance of
metadata for corpus analysis. The authors regret that metadata
concerning interactional events and their participants are used too
rarely in Conversation Analysis and Interactional Linguistics. They
show how metadata can play an insightful role, leading us to
prototypical contexts in which a structure is used particularly
frequently.

The contribution by Rosemarie Tracy and Dafydd Gibbon, “The beat goes
on: a case study of timing in heritage German prosody”, examines the
role of speech rhythm in narrative cohesion on a case study of fluency
and rhythm, conducted on narratives produced by a bilingual speaker of
German as a minority language in an English-speaking environment in
the USA. This case study is an opportunity for the authors to test
several methods like the annotation method or the Rhythm Formant
Analysis method.

As its title suggests, the next section is dedicated to the
“Corpus-based analysis of internet-based communication”. In their
contribution, “Punctuation as an interactional resource”, Michael
Beißwenger and Sarah Steinsiek analyze formal and functional
characteristics of ellipsis points in WhatsApp chats. In this respect,
they take an empirical look at a punctuation mark and its
repragmatization for the purposes of sequentially organized, written
communication.

In the chapter “Linguistic Wikipediology and Wikipedactics”, Leonie
Bröcher, Eva Gredel, Laura Herzberg, Maja Linthe, Ziko van Dijk
present two fields of reflection related to the collaborative online
encyclopedia Wikipedia. Linguistic Wikipediology refers to works in
linguistics that deal with the texts produced for Wikipedia and wikis:
this kind of data makes it possible to study discourses and
interactions, not only from a linguistic point of view, but also from
a cultural or gender studies perspective. The term Wikipedactics
refers to the possibilities offered by Wikipedia for teaching-learning
platforms or didactic use in general.

In the next contribution, “’I think my pig is whistling’ – A case for
the mobile communication database. Or: The possessive pronoun ‘mein’
from a corpus-based perspective”, Wolfgang Imo examines the German
possessive pronoun “mein” in messenger chats. He presents the
different functions of this pronoun: four categories were formed on a
semantic-functional basis, based on the type of possessive relation
expressed in the phrase containing mein. Among other things, the study
provides an insight into routine forms and phraseologisms.

The last chapter of the section, “The INSTAB-Formula. A proposal for
the creation of Instagram data collections for student work” by
Konstanze Marx, suggests a step-by-step method for collecting
multimodal and ephemeral data from Instagram for student work. The
acronym INSTAB describes the various steps to be followed: IN time,
Speichern, Transkribieren, Annotieren, Bereitstellen (In time, Save,
Transcribe, Annotate, Provide). In this way, students can rely on
instructions that enable them to collect data and generate hypotheses
as part of their work.

The last part of the book, “Corpus based analysis and promotion of
language skills”, bridges the gap between the linguistic use of
digital corpora and their potential applications in didactic contexts.
Thomas Bartz and Nadja Radtke suggest a new type of task in their
contribution “Use of digital corpora and analysis tools for
material-based writing in German lessons”. In the wake of the poor
performance of German pupils in international assessments of
educational attainment, new national standards are being developed in
Germany. The authors argue that digital corpora could be used to
create new tasks and focus on the writing process.

The chapter entitled “Coordination – (not) a learning problem for
German as a foreign language?” by Eva Breindl, is an example of how
learner corpora can be used to identify learning difficulties. Using
the MERLIN learning corpus, the author shows that, contrary to what
one might think, coordinative constructions constitute a difficulty
for learners of German as a foreign language. She analyzes occurrences
of coordinative structures on levels B1, B2 and C2 and finds that
complex structures are underused. Learner corpora can therefore help
to identify difficulties but also to improve learner-material: the
author has also observed discrepancies regarding coordination between
rules and examples in basic grammars.

In the chapter “Corpora for German as a foreign language – potentials
and perspectives”, Carolina Flinz, Ruth M. Mell, Christine Möhrs and
Tassja Weber give an overview of corpora that can be used for learning
German as a foreign language. These include learner corpora,
specialized corpora and multilingual corpora for an intercultural
perspective. They show that corpora can be useful supplements to
existing teaching and learning materials, providing a rich reservoir
of data.

In the contribution “Weil-clauses for learners of German”, Aivars
Glaznieks, Jennifer-Carmen Frey and Andrea Abel examine weil-clauses
(because-clauses) and their syntactic variation in German. They can be
realized as verb-final clauses or in a verb-second word order.
However, this variation is not free, but specific to the register and
medium and can also have different semantic and pragmatic meanings. In
this study, the authors compare the use of these clauses in texts of
immersive and non-immersive learners of German in South Tyrol, a
multilingual Italian region.

The last chapter of the book, “What is, what should be – and why?
Language queries from an empirical-linguistic perspective” by
Christian Lang, Roman Schneider and Angelika Wöllstein, highlights the
potential of studying language queries. Based on a corpus of 50 000
queries on language issues, the study shows how this kind of data is
interesting for analyzing the interactions between linguistic
laypersons and experts. More generally, the question of knowledge
transfer within society is at the heart of this study.

EVALUATION

Editing a book devoted to corpus linguistic analysis may seem a
daunting task, given that the methods, tools and fields of application
of Corpus Linguistics have developed considerably in recent decades.
But one must admit that this book succeeds on many levels. The
structure of the book offers a theoretical reflection based on
concrete examples, and it gives an overview of different fields of
application: written language, oral language, digital writing (and its
multimodal dimension), and language learning.

The theoretical section, in particular, convincingly demonstrates the
extent to which corpus analysis enables us to develop research
questions and methods of analysis that remain heuristic models rather
than absolute truths. In this respect, Ulrich Schmitz’s contribution
provocatively warns against the illusion that corpora can be used to
analyze language exhaustively and infinitely. In a similar vein, the
chapters devoted to methodological issues reflect on Corpus
Linguistics in a dynamic way, not a fixed one: it is a constantly
evolving discipline, whose interests and approaches change as it comes
into contact with its objects of study. In turn, we understand how
corpus study has modified our approach to language, by focusing on
usage and providing access to greater linguistic variety.

It is worth noting that the book is not just a succession of case
studies, even if these do lend a concrete, well-argued dimension to
the subject. Several chapters present the genesis of projects that
existed or are still underway. For instance, Marc Kupietz, Harald
Lüngen and Andreas Witt present the DeReKo, its operating principles
and possible future developments. Lothar Lemnitzer’s contribution also
looks back to the finished project on German neologisms Wortwarte, and
looks forward to the future, outlining possible successors to such a
project. These chapters, while not focused on any particular case
study, nevertheless contribute to writing the history of the
discipline. This, of course, makes sense in a book dedicated to the
career of Angelika Storrer, who has contributed over the past three
decades in many ways to the development of corpus-based linguistic
analysis in research and teaching.

A number of chapters have a more informative dimension, presenting one
or more corpora. They provide food for thought by clarifying the
criteria used to build the corpus in question, and by giving examples
of possible studies. In this respect, they are an invaluable resource
for young researchers wishing to discover corpora, and a useful guide
for initiating themselves into methods of analysis. The contribution
by Carolina Flinz, Ruth M. Mell, Christine Möhrs and Tassja Weber, for
example, provides a good overview of corpora currently used in
research into German as a foreign language.

One aspect that could have been developed in the book is that of
contrastive studies and the intercultural dimension of corpus studies.
As it stands, the book focuses mainly on German, and on German as a
foreign language in the final section. However, multilingual corpora
pose challenges in terms of corpus construction and the development of
methodological frameworks that merit further attention.

Generally speaking, the book provides interesting insights into
fundamental questions, current research, and developments in the field
of corpus-based language analysis. It covers theoretical aspects,
descriptions of specific corpora and tools, corpus-based case studies,
and the use of corpora in teaching and learning. The 25 chapters
included in this book mirror the current state of research and are
accessible not only to experts but also to advanced linguistics
students with relevant interests.

ABOUT THE REVIEWER

Dominique Dias teaches Germanic Linguistics at Sorbonne University,
France. He is a member of the research group CELISO, which brings
together researchers specializing in the Germanic, English,
Scandinavian, and Slavic languages. His research interests lie in text
linguistics, text genres, metadiscourses and German media.

------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html

LINGUIST List is supported by the following publishers:

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

John Benjamins http://www.benjamins.com/

Lincom GmbH https://lincom-shop.eu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-1729
----------------------------------------------------------