26.63, Review: Socioling; Text/Corpus Ling: Friginal, Hardy (2014)
    The LINGUIST List via LINGUIST 
    linguist at listserv.linguistlist.org
       
    Tue Jan  6 19:12:34 UTC 2015
    
    
  
LINGUIST List: Vol-26-63. Tue Jan 06 2015. ISSN: 1069 - 4875.
Subject: 26.63, Review: Socioling; Text/Corpus Ling: Friginal, Hardy (2014)
Moderators: Damir Cavar, Indiana U <damir at linguistlist.org>
            Malgorzata E. Cavar, Indiana U <gosia at linguistlist.org>
Reviews: reviews at linguistlist.org
Anthony Aristar <aristar at linguistlist.org>
Helen Aristar-Dry <hdry at linguistlist.org>
Sara Couture, Indiana U <sara at linguistlist.org>
Homepage: http://linguistlist.org
Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!
USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21
For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.
Editor for this issue: Sara  Couture <sara at linguistlist.org>
================================================================
Date: Tue, 06 Jan 2015 14:12:14
From: Irene Checa-Garcia [irene.checa at gmail.com]
Subject: Corpus-Based Sociolinguistics
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=26-63.html&submissionid=35949357&topicid=9&msgnumber=1
 
Discuss this message: 
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=35949357
Book announced at http://linguistlist.org/issues/25/25-678.html
AUTHOR: Eric  Friginal
AUTHOR: Jack A. Hardy
TITLE: Corpus-Based Sociolinguistics
SUBTITLE: A Guide for Students
PUBLISHER: Routledge (Taylor and Francis)
YEAR: 2014
REVIEWER: Irene Checa-Garcia, University of Wyoming
Review's Editor: Helen Aristar-Dry
SUMMARY 
As the subtitle indicates, “Corpus-Based Sociolinguistics: A Guide for
Students” is intended to be a student book. In the preface the authors define
their target audience by stating that “this book can effectively guide
student-researchers in upper-level undergraduate and graduate courses in
sociolinguistics” (p. xv). The increasing number of studies in
sociolinguistics adopting a Corpus Linguistics (CL) approach and the solid
quantitative and empirical methodology that such an approach offers inspired
the authors to write a manual on the use of CL applied to sociolinguistic
inquiries. 
The book consists of three blocks: Block A introducing both sociolinguistics
and CL: main goals, methodologies and a brief history, with a bit more focus
on CL; Block B presenting an overview of work on several popular areas of
sociolinguistic research; and Block C discussing practical methodological
issues for the researcher who wants to use a corpus to do a sociolinguistic
inquiry. 
All chapters contain sections named “Reflective break” with two or more
stimulating questions that require the application of and reflection upon the
content previously presented, often along with suggestions for further
research. Most chapters include examples or summaries of papers dealing with a
sociolinguistic issue in the CL-oriented method discussed in that chapter. In
addition, maps, histograms, tables summarizing results, tagging examples,
etc., help illustrate the content provided throughout the book. Interviews
with well-known CL and sociolinguistics researchers are included in some
chapters: Grieve (B-1), Tagliamonte (C-1), and Biber (C-2). 
The first block of the book dedicates one chapter to introduce
sociolinguistics, three chapters to introduce CL and one final chapter to
discuss CL's application to sociolinguistics. The chapter on sociolinguistics
defines briefly the discipline and summarizes in 1-2 paragraphs five main
sociolinguistic approaches: ethnography of communication, interactional
sociolinguistics, conversation analysis, experimental sociolinguistics, and
variationist sociolinguistics. In addition, the authors group sociolinguistic
enterprises into two categories: quantitative and qualitative, giving examples
of works using each of them with emphasis on the type of questions answered.
The following chapter defines CL and offers a brief history of the field, from
early applications to dictionary making to the newest collection of electronic
megacorpora. Different kinds of corpora –specialized vs. general; spoken vs.
written –, their sizes, and how they are typically collected are discussed in
the next chapter. This chapter also introduces basic concepts in CL research,
such as normalized frequency, n-grams, lexical bundles, keywords and tagging,
among others, later mentioned in other sections of the book. The chapter is
completed by reviewing some software available to search for instances of
these concepts in corpora, although popular software among CL and
sociolinguistics such as VARBRUL (Cedergren and Sankoff 1974) and the R
language (Gries 2009) is not reviewed. Next, the authors present the notion of
representativeness of a corpus, in terms of target population and variety of
registers and linguistic diversity. They recommend creating a corpus matrix
and they offer examples. In addition, they suggest resorting to
ethnographic/qualitative studies of the target community to help design such
matrices more accurately. All these topics are again revisited in the third
block in a more detailed and hands on manner. Block A ends with a discussion
of the limitations and future directions of corpus approaches to
sociolinguistics, but first CL use for some sociolinguistic topics is
sketched. The limitations of such an approach are of two types: limitations in
corpora encoding of, and accounting in sampling of, social variables, and the
impossibility of applying this methodology to some sociolinguistic areas, for
instance language policy. 
The second and most extensive part of the book exemplifies the application of
CL to several popular sociolinguistic areas: dialectology, studies on gender,
sexuality and age, politeness and stance, workplace discourse, diachronic
variation, and web registers. Each chapter focuses on one topic. It first
introduces the relevant sociolinguistic notion, typically discussing some
results of studies using corpora and offering a more detailed summary of one
or two key studies on the topic thereafter. The corpora used are described in
detail, and in some chapters (B-1 on dialectology, B-4 on workplace discourse,
and B-5 on diachronic variation) a quite complete list of available corpora is
included. Corpora are described in terms of sampling variables, collection
methods, size, social variables annotated, and search tools they offer for
their analysis when this is desired/needed. In the case of dialectology, a
topic with a very rich tradition in sociolinguistics, the authors comment on
older studies which led to more extensive use of corpora in the present.
Another area of frequent corpora use has been gender. For stance, the
limitations of CL analysis are pointed out, since corpora are not typically
annotated for prosodic and phonetic features that can mark stance. Finally,
for some areas --stance, language change, web registers- analysis of word
trends and content, in the line of culturomics research rather than
linguistics, are included in the review of studies. 
The third and final part of the book deals with some practical advice and
descriptions of procedures with which to perform a sociolinguistic
corpus-based study. The first chapter explains how to create a sociolinguistic
corpus. After emphasizing the importance of well-defined questions that guide
the sampling design and its justification, a model study is presented.
Stratified random sampling is recommended together with very general
guidelines concerning statistical test assumptions and social categories that
are considered. Specific formats and format handling along with file
organization are suggested. Finally, ethical and legal considerations from
Scocco (2007) are summarized. Two types of data collection are discussed  in
more depth: how to create a corpus from blogs, and how to achieve a
naturalistic sociolinguistic interview. The next chapter presents the
Multidimensional Analysis (MD) model (Biber 1988). First an account of the
rationale of the model and main achievements is presented. In an interview
with Biber, this corpus linguist evaluates the main findings of this type of
analysis as well as criticisms and how those could be addressed. MD detects
linguistic features that tend to occur together and groups them into factors
or dimensions whose relation to social variables can be tested statistically.
Then the chapter explains the MD procedure. The authors warn the reader,
though, that they offer only a general description of the procedure rather
than detailed instructions and offer a bibliography to learn how to precisely
carry out this analysis. The various steps are exemplified by Friginal’s 2009
work, and an interpretation of the results is provided as means of
exemplifying MD’s explanatory powers. The last two chapters present additional
CL methods to study variation in language, both diachronic (C-3) and
synchronic (C-4). The authors offer advice for diachronic data collection over
the internet and many suggestions for research questions in this area. The
last chapter describes a step-by-step procedure to determine keyness in a
corpus with respect to a reference corpus using AntConc. Examples of studies
using automatic taggers close the book. One uses the LIWC tagger (Linguistic
Inquiry and Word Count), which tags content, grammatical, and even spoken
language features. The other uses POS (Parts of Speech) tagging, whose results
can be correlated to different sociolinguistic variables. 
EVALUATION 
“Corpus-Based Sociolinguistics” is mostly a practical introduction to the use
of CL for sociolinguistic questions. However, rather than offering a
step-by-step guide on how to make a sociolinguistic analysis using CL, this
book mainly offers examples of areas of application of CL within
sociolinguistics, and relevant bibliography to explore further how to apply CL
methods. In this respect, this work can be very helpful in two ways to the new
sociolinguist that wants to use CL. 
First, it can serve to identify the ideal corpus for a myriad of research
topics in sociolinguistics and even content analysis and culturomics, as long
as the research is on the English language. There is little reference to
corpora in languages other than English, and in the cases where there is, it
is not very exhaustive. For instance, for Spanish there is no mention of one
of the largest and widest corpora representing different varieties of Spanish,
the PRESEEA (Moreno Fernández 1996). However, such a review would be beyond
the scope of the book. As for the English corpora, the authors often include
the links to access them online or contact information to inquire about them,
as well as what software is available to analyze them. Diachronic, synchronic,
electronic, and specialized corpora, among many other corpora types, are
referenced. 
The other aspect in which the book can be very helpful for sociolinguistic CL
research is the creation of a corpus, particularly from the internet.
Representativeness and feasibility are presented in a very clear and practical
manner and good models are offered and discussed. In addition, the book points
to online resources such as blogs and the websites that index them, and gives
advice on how to organize the corpus files. Sociolinguistic research on the
new internet registers is also reviewed, which constitutes a novelty in the
literature, as the other general presentation of CL and sociolinguistics
(Baker 2010) barely touches upon this, which is to be expected since the
booming of this new research area is very recent. 
On the other hand, this book cannot be taken as a guide to carry out the
different CL analyses reviewed, such as MD or LIWC. Neither the needed
software, nor the statistical knowledge are explained in enough detail, as
noted by the authors themselves. Instead, the book refers to relevant
specialized books for those tasks. Also, and despite the authors' claim in the
preface, in order to use this book as a textbook for an introductory class on
sociolinguistics, a supplementary sociolinguistics manual could be very
valuable. Although the book does make a successful effort not to assume any
prior sociolinguistic knowledge, and every sociolinguistic concept is
discussed before studies on it are reviewed, these notional introductions are
not always very clear, and often very brief, with the majority of the chapter
dedicated to reviewing studies on the concepts and the corpora used. 
Another area the book does not spend much time covering is multimedia corpora.
This is a consequence of their corpus conceptualization as a collection of
searchable tagged texts. Therefore, there is little or no review of studies
concerning video data or sound data, nor of software to align multimedia with
transcription such as ELAN. 
Few books have yet been published that offer an account of how to do
sociolinguistics with CL or that describe the relationship between the two
disciplines. The one exception is Baker (2010) “Sociolinguistics and Corpus
Linguistics”. Although, as mentioned, Baker’s book pays little attention to
Computer Mediated Communication of any kind, and he explains possibly fewer
sociolinguistic and CL basic notions, his book's explanations are more in
depth. Likewise, statistical procedures are explained in more detail by Baker,
although arguably they are more simple (univariate) than MD’s factorial
analysis. Also, Baker’s work pays more attention to the explicit discussion of
sociolinguistics and CL relations. By contrast, Friginal and Hardy devote a
smaller portion of the book to discussing this relationship (the last two
sections of the final chapter in Block A). Instead, the relationship between
the two disciplines arises indirectly from the review of a large quantity of
studies that employ both. Another difference in focus is the attention paid to
interactional sociolinguistics, which is explored in two chapters in Baker and
only briefly talked about in the first chapter in Friginal and Hardy.
 
In sum, “A Corpus-Based Approach to Sociolinguistics” will serve the
undergraduate course on sociolinguistics if supplemented with a manual on
sociolinguistic concepts; it will then constitute an original and up to date
introduction to the discipline, as well as to the CL methodology. Furthermore,
it will be an even more valuable resource for the researcher new to CL that
wishes to apply this methodology to sociolinguistics quantitative questions or
wishes to know what sociolinguistics questions could be addressed with this
methodology. Although not a manual on how to do sociolinguistics with corpus
linguistics per se, it will direct the researcher to the right resources.
Finally, the thought provoking “reflective breaks” and the numerous examples
of studies will stimulate younger students and make sociolinguistic research
more appealing, while suggesting new research questions to the more advanced
students or researchers. 
REFERENCES 
Baker, P. 2010. “Sociolinguistics and Corpus Linguistics.'' Edinburgh:
Edinburgh University Press.
Biber, D. 1988. “Variation across speech and writing”. Cambridge: Cambridge
University Press. 
Cedergren, H. and Sankoff, D. 1974. Variable rules: Performance as a
statistical reflection of competence. Language, 50: 333-355. 
Friginal, E. 2009. A corpus-based study of gender and age in blogs. “Language
Forum”, 35 (2): 19-37. 
Gries, S. T. 2009. “Quantitative corpus linguistics with R: a practical
introduction”. London and New York: Routledge
Moreno Fernández, F. 1996. Metodología del “Proyecto para el Estudio
Sociolingüístico del Español de España y de América” (PRESEEA). “Lingüística”
8: 257-287.
Scocco, 2007. Copyright Law: 12 Dos and Don’ts. “DailyBlog Tips”. Available
from http://www.dailyblogtipos.com.
ABOUT THE REVIEWER
Irene Checa-Garcia is Assistant Professor at University of Wyoming. She wrote
her dissertation on measures of Syntactic Development in adolescents and
social factors influencing it. During her postdoctoral years, at University of
León and University of California, Santa Barbara, she worked on Functional
Syntax of Spanish relative clauses using corpus linguistics methodology. She
also works on a Conversation Analysis project on very young children's
embodiment of action and on morphosyntactic development of young
Spanish-English bilinguals with and without Specific Language Impairment. Her
main interests include quantitative sociolinguistics, bilingualism,
grammaticalization patterns, and conversation analysis of very young children
interactions.
----------------------------------------------------------
LINGUIST List: Vol-26-63	
----------------------------------------------------------
    
    
More information about the LINGUIST
mailing list