31.3455, Review: Applied Linguistics: McNamara, Knoch, Fan (2019)

The LINGUIST List linguist at listserv.linguistlist.org
Tue Nov 10 19:57:37 UTC 2020


LINGUIST List: Vol-31-3455. Tue Nov 10 2020. ISSN: 1069 - 4875.

Subject: 31.3455, Review: Applied Linguistics: McNamara, Knoch, Fan (2019)

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Jeremy Coburn <jecoburn at linguistlist.org>
================================================================


Date: Tue, 10 Nov 2020 14:57:08
From: Carmen Ebner [ebner.c at cambridgeenglish.org]
Subject: Fairness, Justice and Language Assessment

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36543298


Book announced at http://linguistlist.org/issues/30/30-2210.html

AUTHOR: Tim  McNamara
AUTHOR: Ute  Knoch
AUTHOR: Jason  Fan
TITLE: Fairness, Justice and Language Assessment
SUBTITLE: The role of measurement
SERIES TITLE: Oxford Applied Linguistics
PUBLISHER: Oxford University Press
YEAR: 2019

REVIEWER: Carmen Ebner

SUMMARY

Consisting of nine chapters, Fairness, Justice and Language Assessment
provides a detailed introduction and overview of Rasch analysis, which is a
type of psychometric measurement used to analyse categorical data not only in
language assessment, but also in fields such as healthcare and social science
research. The aims of this book are twofold: besides exploring the distinction
between the concepts ‘fairness’ and ‘justice’ and their role in language
assessment, the main focus lies on demonstrating the usefulness of various
Rasch measurements in increasing fairness in language assessments. By using
Rasch analysis, it is possible, for instance, to identify the difficulty of
each test item, which enables test creators to improve the test’s fairness by
revising the test item composition.

Starting with a general description of test validity, McNamara et al. draw on
Samuel Messick’s (1989, p.13)’s definition of validity as “an integrated
evaluative judgement” of how appropriate and adequate inferences about a test
taker’s abilities are, based on test scores. Fairness and justice tie in
closely with the concept of validity as they specifically target the
inferences’ appropriateness and adequacy. Yet, it is important to emphasise
the difference between fairness and justice. McNamara et al. (2019, p.10)
describe fairness as an internal quality of language assessments that includes
for instance various types of rater effects: Does the language background of
test raters have an influence on their assessment? Do novice examiners rate
test takers’ performances differently than veteran raters? Justice, on the
other hand, is considered external to the test and deals with how the test is
being used by society. The book contains a few examples illustrating this
concept such as the wildly debated suggestion to use the International English
Language Testing System (IELTS) as a means to prove language proficiencies in
Australian citizenship applications (McNamara et al., 2019, pp. 192-193). How
appropriate and adequate the use of IELTS, which covers all four language
skills (writing, reading, listening, and speaking), is in obtaining the
Australian citizenship constitutes a valid question. The proposed pass mark of
Band 6 (CEFR B2 level) in all four skills would make the Australian
citizenship language requirements tougher than any European equivalent
(McNamara et al., 2019, p. 192).

The main body of the book (Chapters 3 to 5) contains a descriptive
introduction to four main Rasch models: the basic Rasch model, the Andrich
rating scale model, the partial credit model and the many-facets Rasch model.
The authors do not only explain statistical concepts and measurements for each
model, but also provide a step-by-step illustration of how the different types
of Rasch models are implemented in the supplementary material which is
available on accompanying websites. It also contains the exercise files on
which the book’s examples are based. While the basic Rasch model is used for
dichotomous categorical data, such as incorrect/correct questions, the Andrich
rating scale and the partial credit models are generally used for polytomous
data. The Andrich rating scale model is mainly used for the assessment of
performative skills (e.g. longer sections of speech or writing) which requires
the use of Likert or semantic differential scales (e.g. scales ranging from
strongly agree to strongly disagree). Partial credit models, on the other
hand, are used for shorter responses to comprehension tasks (i.e. listening or
reading), which could be scored as 0, 1, and 2, for instance. What is
essential, however, is that neither of these Rasch models takes the rater as a
potential influential factor into account. To assess and capture the potential
influence of the rater on test scores the many-facets Rasch model can be used.

In Chapter 6, to illustrate the use of Rasch models in the field of language
testing,  McNamara et al. compiled an overview of studies applying this method
to investigate fairness in language assessment. By drawing on a similar survey
looking at the period from 1984 to 2009 (McNamara and Knoch, 2012), the
authors suggest a growing popularity of the Rasch analysis. Chapter 7 are 8
are theoretical in nature and provide more background information on the
development of the different types of Rasch models, another distinction
between the different types of Rasch models as well as a section on criticism
directed towards Rasch methods. This criticism is mainly focused on the fact
that Rasch analysis focuses mainly on item difficulty, whereas other types of
Item Response Theory (IRT) analysis also consider parameters such as the
guessing behaviour of test takers for example. 

Besides summarising the two main aims of the book in the conclusion, McNamara
et al. included a discussion of the issue of justice in language assessment 
for which they made use of a few illustrative examples of inappropriate and
inadequate, hence unjust, uses of language tests. The proposal to use IELTS as
a requirement for a successful Australian citizenship application  mentioned
above is one of these examples. 

EVALUATION

McNamara et al.’s Fairness, Justice and Language Assessment (2019) constitutes
a solid introduction to Rasch analysis. The book contains a good amount of
theoretical and historical background to be able to contextualise this method
in the field of language assessment and to recognise the advantages of
applying Rasch models. Fulfilling one of the book’s main aims, the authors
provide a convincing argument for the use of Rasch measurements to explore and
increase test fairness. 

Being written in a straightforward and instructive manner, the book is
accessible for students and scholars who wish to gain an elementary
understanding of how Rasch models can be used to address issues of fairness.
What makes this book particularly useful are the excellent supplementary
materials through which the reader obtains a guided hands-on experience with
the different Rasch models. Unfortunately, these materials were not
incorporated in the book but can only be accessed online. While the authors
state space limitations as a reason for the separation between the hands-on
exercises, theory and the explanations, one comprehensive guide to Rasch
analysis would have facilitated a more natural and structured processing of,
at times, complex statistical procedures. In addition, novices to Rasch
analysis could have further benefited from the inclusion of a glossary.

Fairness, Justice and Language Assessment is generally well organised and
particularly well written. While McNamara et al. have included a good amount
of theoretical and historical background regarding the evolution of different
Rasch models, Chapters 7 and 8 contain some general background information
which would have been better placed at the beginning. For instance, these two
chapters contain a good overview of the family of Rasch models and its
relatives which could have been placed in one of the introductory chapters, as
this would have helped the reader to contextualise the Rasch models better. It
is, however, commendable that the authors also included a brief section on two
alternatives to Rasch models, Generalizability theory, also known as G-theory,
and Structural Equation Modelling in Chapter 8. The descriptions of these
alternative approaches and their comparisons to Rasch models are very well
written, albeit brief.   

Overall, McNamara, Knoch and Fan have very competently illustrated how Rasch
models can be used to address fairness as a test’s internal quality. Using
relatable and well-explained language assessment examples, the authors
describe the key output of the software programs (e.g. Winstep and FACETS) in
detail, both in the book and the supplementary materials. Thus, the reader
gets a clear explanation and demonstration of how to interpret the software’s
output, such as Wright maps, item tables and category probability curves. 

While fairness is covered extensively by the authors, the issue of justice in
language assessment could have been elaborated on in more depth. Drawing on
Messick (1989), the authors make clear that “language testing is a thoroughly
social, even political activity” (McNamara et al., 2019, p. 197), which
requires taking the social and political contexts of language testing into
account. With language being a social medium, the importance of the social
dimension of language testing has already been mentioned by McNamara and
Roever (2006). The aforementioned case of proposed changes to the Australian
citizenship application serves as an excellent example which illustrates the
social and political characteristics of language assessment. Whether and how
Rasch methods could be used to address injustice in language assessment
constitute interesting questions which require a more in-depth explanation and
discussion. 

REFERENCES

Messick, Samuel. 1989. Validity. In R.L. Linn (ed.). Educational Measurement.
3rd ed. New York: American Council on Education & Macmillian. 13-103.

McNamara, Tim & Knoch, Ute. 2012. The Rasch wars: the emergence of Rasch
measurement in language testing. Language Testing 29(4). 555-576.

McNamara, Tim & Roever, Carsten. 2006. Language Testing: The Social Dimension.
Malden: Blackwell.


ABOUT THE REVIEWER

Dr Carmen Ebner is a sociolinguist currently working for Cambridge Assessment
English as a Projects Assistant. Her PhD examined attitudes towards
stigmatised and disputed usage features in British English. Carmen’s research
interests include language variation and change, historical sociolinguistics,
corpus linguistics and language and identity.





------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-3455	
----------------------------------------------------------






More information about the LINGUIST mailing list