32.2336, Review: Linguistic Theories; Sociolinguistics; Text/Corpus Linguistics: Rüdiger, Dayter (2020)

Fri Jul 9 22:30:06 UTC 2021

LINGUIST List: Vol-32-2336. Fri Jul 09 2021. ISSN: 1069 - 4875.

Subject: 32.2336, Review: Linguistic Theories; Sociolinguistics; Text/Corpus Linguistics: Rüdiger, Dayter (2020)

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Jeremy Coburn <jecoburn at linguistlist.org>
================================================================

Date: Fri, 09 Jul 2021 18:29:46
From: Ylva Biri [ylva.biri at helsinki.fi]
Subject: Corpus Approaches to Social Media

Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36699657

Book announced at http://linguistlist.org/issues/31/31-3766.html

EDITOR: Sofia  Rüdiger
EDITOR: Daria  Dayter
TITLE: Corpus Approaches to Social Media
SERIES TITLE: Studies in Corpus Linguistics 98
PUBLISHER: John Benjamins
YEAR: 2020

REVIEWER: Ylva Biri, University of Helsinki

SUMMARY

How can corpus linguistic methodology contribute to the study of language and
communication in social media, whether in terms of stylistic variation or
community jargon? In response to the recent upsurge in big data methodology
and qualitative linguistic approaches to social media, “Corpus Approaches to
Social Media” brings together eight studies that apply corpus linguistic
methods to various social media datasets. As the editors, Sofia Rüdiger and
Daria Dayter, explain in their introduction, the work builds on a workshop on
corpus approaches to social media held as part of the 40th conference of the
“International Computer Archive of Modern and Medieval English” (ICAME40) in
2019 in Neuchâtel. 

The eight empirical chapters of the volume are divided into three thematic
parts, each part sharing a focus of interest or theoretical background. The
studies in Part 1 of the volume apply corpus linguistics to study social media
communities, an area of study that has traditionally attracted mainly
qualitative methods. 

In Chapter 1, Sven Leuckert and Martin Leuckert draw on the theoretical lens
of the Communities of Practice (CoP) framework (Lave and Wenger 1991) to
assess how CoP criteria fit different groups on Reddit, a discussion website.
Structured around the three criteria for CoP – mutual engagement, shared
goal/enterprise, and shared repertoire – the study uses corpus linguistic
methods to identify frequent words and discursive differences in mentions of
the word “community”. Data on user activity in different Reddit communities
are used to track cross-community interests and connections. The authors argue
that while CoP theory does not fit Reddit groups perfectly, it can be used as
a theoretical basis for corpus linguistics to understand the participation and
sociolinguistics on a social media platform. 

In Chapter 2, Lisa Donlan analyses the discourse and power structures of a
Reddit group for fans of pop music. The chapter presents a case study of the
community’s debate surrounding the word “wig” in the sense of “surprise or
enthusiasm”, which some users perceive as overused and worth banning. By
combining frequency data with ethnographic and discourse analytical
observations, Donlan shows how the implementation of community rules and word
censorship is negotiated between officially sanctioned moderators and
established active community members. Despite the study focusing only on one
case, the clear structure and theoretical background of the chapter give
insight into the power dynamics at play when community members try to
negotiate their own linguistic resources. 

In Chapter 3, Daria Dayter and Sofia Rüdiger apply corpus-based discourse
analysis to study how pick-up artists refer to women. Dayter and Rüdiger show
how the ways that women are referred to in the online pick-up artist community
reflect the community mindset of treating dating as a game and women as prey.
The chapter also has a methodological goal, as the authors compare referential
strategies yielded from researcher intuition to strategies identified through
manual corpus tagging and through automatic annotation of semantic fields
combined with reverse collocation (Brezina et al. 2015). While manual tagging,
unsurprisingly, has the highest accuracy and recall, the authors recommend
less time-consuming semantic tagging as a complementary method. 

Part 2 focuses on short texts and linguistic variation, also addressing
related methodology. 

In Chapter 4, Samuel Felder researches intra-individual linguistic
accommodation in dyadic communication on WhatsApp. The study analyses how two
individuals chatting with each other might over time change their linguistic
behaviour; the focus is on accommodation for increased similarity, though
other patterns are considered as well. Using a corpus of German-language
WhatsApp messages collected in Switzerland, Felder identifies users’ change in
use of emojis, exclamation marks, written laughter particles (e.g. “haha”), as
well as in Swiss German dialectal spelling variation. The discussion of
individual representative cases reveals the presence of patterns rather than
the frequency of a trend, but the patterns nevertheless show how adults might
over time shift their linguistic behaviour.

In Chapter 5, Aatu Liimatta tackles the methodological issue of comparing the
frequency of a linguistic item in texts of different lengths. The problem is
familiar to researchers of social media and other genres with text length
variation, because frequencies normalized, for example, to a base of 1,000
words are misleading when comparing, say, a text of 10 words to a text of 500
words. Illustrating his arguments with a dataset of Reddit posts, where text
word counts range from 0 to 2,000, Liimatta proposes two alternatives to
normalization, both based on comparing a text to texts of similar length
rather than to the total corpus. In the first approach, frequency values
reflect how many texts of comparable length have a lower frequency, whereas in
the second approach, frequency values are scaled around the median frequency
of texts of comparable length. 

In Chapter 6, Martin Eberl continues on the topic of text length, asking how
and whether the linguistic structure of tweets was affected by Twitter’s 2017
change of character limit from 140 to 280 characters. The former constraint of
a maximum of 140 characters has been theorised to be a reason behind the
ubiquitous use of contractions, clippings and emojis on Twitter. Eberl tests
this claim by comparing the use of these features in tweets prior to the limit
switch to tweets after the switch. While his findings suggest a linguistic
change due to the switch, there is no straightforward interpretation linking
the switch to either the increase or decrease in the use of the features.
Rather, the frequency data also hint at a complex relationship between the
features and other, unknown variables. 

Linguistic studies of social media often overlook images, and the studies in
Part 3 are a welcome contribution to studying multimodality in social media.

In Chapter 7, Alex Christiansen, William Dance, and Alexander Wild propose a
method for automatically annotating a multimodal corpus of tweets. Images are
a prevalent part of many tweets, yet corpus linguistics still lacks methods
enabling automatised archival, annotation, and retrieval of image data in a
corpus. In this chapter, the authors describe the compilation of a multimodal
Twitter corpus, where tweet text data are accompanied by image descriptions
acquired through computer vision AI, OCR scanning, and retrieval of contextual
labels from the web. Use of image content data is showcased in an empirical
section that finds discursive differences between tweets that mention “Donald
Trump” in the tweet text and tweets with mentions in the image only. Though
the reliability of the automatized annotation remains unclear, the annotation
of the sample tweets is convincing, and the analysis exemplifies the need for
linguists also to consider image content.

In Chapter 8, Luke C. Collins presents a case study on the branding strategy
of the Facebook page of a local UK business. The images in the corpus are
classified based on image type; frequency and collocation analysis are used to
investigate what types of words and emojis different image types and the brand
in general is associated with. In describing how emojis and images contribute
to branding by conveying interpersonal warmth and localness, Collins also
shows how corpus linguists can include image data for quantitative methods to
yield insight into social media beyond text. 

The volume ends with a response chapter by Claire Hardaker, which links
together the studies in the volume. The internet as a medium of communication
is so recent, Hardaker argues, that it is still a challenge to predict how it
might develop. For corpus linguistics, this requires the fine-tuning of
analytical methods for computer-mediated communication, including new and
future computer-mediated activities. 

EVALUATION

Social media is becoming an increasingly important data source for different
fields of linguistics, including corpus linguistics. “Corpus Approaches to
Social Media” is a welcome collection on the compilation and analysis of
social media corpora. As also pointed out by the editors in their introduction
of the volume, research on social media covers a range of methodologies, from
highly quantitative big data to more qualitative studies. Corpus linguistics
has the potential to bridge the gap by providing quantitative methods and
linguistic understanding. 

At 208 pages, the volume is not overly long, yet it addresses a range of
themes and study questions. The potential applications of corpus linguistics
are evidenced by the distinct foci of the parts the volume is divided into
(social media communities, short texts, and image data). The volume’s
variation in method-combination also gauges the range of questions that corpus
linguistics might answer: the studies further combine corpus analysis with
discourse analysis (see Ch. 2 by Donlan; Ch. 3 by Dayter and Rüdiger) and with
variationist approaches (Ch. 1 by Leuckert and Leuckert; Ch 4. by Felder; Ch.
6. by Eberl). The volume also makes some innovative methodological proposals
to corpus analysis and compilation (Ch. 5 by Liimatta; Ch. 7 by Christiansen,
Dance and Wild), which can shed light on issues that are by no means limited
to social media linguistics.

The volume includes both studies that consider social media as a data source
and studies that see social media as the object of study. Dayter and Rüdiger's
(Ch. 3) and Felder's (Ch. 4) contributions use data from social media to reach
conclusions related to more general offline language use. In the other six
chapters, the focus is more on social media as a setting with a particular
language use. Considering the differences in writing on social media versus
other settings, the volume has merit in also looking at social media
idiosyncrasies, such as the use of punctuation and emojis (Ch. 4 by Felder;
Ch. 6 by Eberl; Ch. 8 by Collins). The range of social media platforms studied
is another one of the volume’s strengths. In particular Reddit, used as a data
source in three of the eight studies, seems to have potential as a resource
for various linguistic questions, especially when it comes to comparing
different groups. Due to challenges in data acquisition, WhatsApp and Facebook
are currently understudied, considering their popularity and potential for
linguistic inquiry; this makes the contributions by Felder and by Collins
welcome additions. 

Stylistically, all contributions are well-written, concise, and built around
clearly phrased research questions. Within the thematic parts, the volume
seems cohesive, and the index helps finding topics across chapters. The use of
illustrative tables or colour figures is generous but by and large not
superfluous. Each chapter describes the social media platform it uses.
However, due to the focus of the volume, it is best suited for readers who
already have a basic understanding of corpus linguistic thinking and research
design. 

Some of the chapters could have benefited from a narrower focus or from more
space. For example, Chapter 1, in choosing to address three CoP criteria,
seemed to run into lack of space, as the word “community” is not the only way
to refer to a community and the frequent vocabulary was not discussed very
much. Thus, the three parts provided snapshots able to address but not quite
reach the answers to the research questions. Meanwhile, Chapters 4 and 8 focus
on a small set of linguistic features, and while this may have been due to
lack of space or the authors disseminating their results elsewhere, seeing a
bit more might have made the arguments more convincing. These issues, however,
do not take away from the clarity of the texts or the overall research
designs. 

While an enjoyable read, the substance of the volume as a whole remains brief.
The impact of the volume might have benefited from a more in-depth theoretical
chapter or recommendations. For example, the introductory chapter briefly
highlights interdisciplinary research and data access and ethics as desiderata
for future research. These issues go largely unmentioned in the chapters that
follow. Individual examples of data access are provided, especially in
Collins’s (Ch. 8) case of a Facebook page. Meanwhile, Donlan's (Ch. 2)
combination of corpus linguistics and ethnography might illustrate a more
interdisciplinary use of corpus linguistics. Nevertheless, in the end, the
reader is left without more concrete takeaways on how the important and valid
desiderata mentioned in the introduction might be addressed. 

Another understandable limitation is that the volume’s empirical studies are
all based on monolingually Anglophone social media, with the welcome exception
of Felder's study of written Swiss German. This is likely because the volume
stems from a workshop at the International Computer Archive of Modern and
Medieval English (ICAME) conference. Hopefully this volume will pave the way
for more work also on non-English or multilingual social media corpora. 

These minor shortcomings notwithstanding, I appreciate the book as a
collection on corpus linguistic approaches. The topics studied here in social
media data, with the particular characteristics and constraints to social
media writing, could be just as relevant to data acquired from other sources
and media. On that note, social media researchers will know that, with the
rapid change of social media, research on individual social media platforms
will soon become outdated, even if (as discussed in Hardaker’s response
chapter) the fundamentals of human communication remain similar. There are a
number of individual studies that use quantitative or qualitative corpus
linguistics on social media data, especially ones using (Critical) Discourse
Analysis, so it seems therefore strange that there are no earlier dedicated
hardcover collections on the topic. As a collection of diverse approaches and
platforms, “Corpus Approaches to Social Media” is a worthwhile pioneer volume.

REFERENCES

Brezina, Vaclav, McEnery, Tony and Wattam, Stephen. 2015. Collocations in
context: A new perspective on collocation networks. International Journal of
Corpus Linguistics 20(2): 139-173. https://doi.org/10.1075/ijcl.20.2.01bre

Lave, Jean and Wenger, Étienne. 1991. Situated Learning: Legitimate Peripheral
Participation. Cambridge: Cambridge University Press.

ABOUT THE REVIEWER

Ylva Biri is a PhD Candidate in the Doctoral Programme in Language Studies,
University of Helsinki. Her research interests include sociolinguistics,
register analysis and computer-mediated communication. Her PhD research uses
corpus linguistics to explore stylistic variation and the pragmatics of
communication in interest-based social media groups. Ms Biri holds a MA in
English Philology from the University of Helsinki, with minor studies in
General Linguistics and Pedagogy.

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-32-2336	
----------------------------------------------------------