30.280, Review: Language Acquisition; Phonetics; Phonology; Sociolinguistics; Text/Corpus Linguistics: Durand, Gut, Kristoffersen (2018)

Thu Jan 17 20:52:53 UTC 2019

LINGUIST List: Vol-30-280. Thu Jan 17 2019. ISSN: 1069 - 4875.

Subject: 30.280, Review: Language Acquisition; Phonetics; Phonology; Sociolinguistics; Text/Corpus Linguistics: Durand, Gut, Kristoffersen (2018)

Moderator: linguist at linguistlist.org (Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté)
Homepage: https://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Jeremy Coburn <jecoburn at linguistlist.org>
================================================================

Date: Thu, 17 Jan 2019 15:52:39
From: Sara Diaz [sara_diaz_10 at hotmail.com]
Subject: The Oxford Handbook of Corpus Phonology

Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36433178

Book announced at http://linguistlist.org/issues/29/29-922.html

EDITOR: Jacques  Durand
EDITOR: Ulrike  Gut
EDITOR: Gjert  Kristoffersen
TITLE: The Oxford Handbook of Corpus Phonology
PUBLISHER: Oxford University Press
YEAR: 2018

REVIEWER: Sara Diaz, Universidad de Extremadura

SUMMARY

The Oxford Handbook of Corpus Phonology, edited by Jacques Durand, Ulrike Gut,
and Gjert Kristoffersen. constitutes a comprehensive and diverse collection of
essays on corpus phonology, “a new interdisciplinary field of research that
has only begun to emerge during the last few years” (1). Its aim, as clearly
stated in the introduction, is to discuss possible ways to standardise corpus
compilation, annotation and exploitation. The issue of international
standardisation is present all throughout the four parts in which the book is
structured. The first part, “Phonological Corpora: Design, Compilation, and
Exploitation”, contains eight essays that deal with, as its name already
predicts, essential concepts that need to be taken into account when building,
annotating or using a phonological corpus. The first essay (Chapter 2) is
basically an introductory one in that it begins by providing a definition of
what a phonological corpus is and then addresses basic questions everyone
building a corpus will need to answer such as how representative? How big? How
is the corpus going to be preserved? The three following chapters further
develop, respectively, the processes of data collection, annotation and
automatic phonological transcription that are touched upon in Chapter 2.
Chapter 6 focuses on cluster analysis, a method for the exploitation of speech
corpora that involves statistics and linear algebra and provides useful
resources including websites, books and software. In the seventh essay, the
notions of corpus archives and data preservation and dissemination are
examined and there is an emphasis on how these notions have changed due to
technological innovation and on the need for new types of data centres.
Finally, the last two chapters of Part I are concerned with formats, metadata
and data formats. In Chapter 8, the authors give many recommendations and
internet sources and, in Chapter 9, they concentrate on formats, especially
standardized formats, of annotated spoken corpora. Part I is arranged into a
chronological structure that covers the entire process of corpus compilation
and use. 

Part II integrates five essays that show how the use of speech corpora can
contribute to research on the field of phonology and other related subfields.
While the first paper (Chapter 10) is more theoretical in that it reviews some
key terms such as ‘phonology’, ‘phonetics’, ‘corpus’ and ‘corpus-based
approach’, the other four describe practical applications of corpora in
different areas of phonological research, namely, segmental phonology in
Chapter 11, post-lexical phonology in Chapter 12, child phonological
development in Chapter 13 and second language acquisition in Chapter 14.

The third section of the Handbook is devoted to the presentation of some of
the most prominent tools and methods in the field. On one hand, two of the
eight chapters that make up Part III (15 and 21) provide an overview of a
stand-alone tool each. In Chapter 15, Han Sloetjes illustrates ELAN, a
multimedia annotation tool which not only accepts audios but also videos and
can create ‘annotation tree structures’. The other stand-alone tool, ANVIL, is
presented in Chapter 21. ANVIL allows for the annotation of audio, video and
3D motion-capture data and, most importantly, for interoperability with other
tools. On the other hand,  two other chapters are concerned with the
description of EMU (Chapter 16) and EXMARaLDA (Chapter 20), two systems for
the analysis and management of speech databases that integrate a collection of
tools. The computer program known as Praat is developed throughout Chapters 17
and 18 where the authors focus, first, on how this program can be used in
phonological corpus research and, second, on how Praat scripting language can
make corpus building and analysis easier. Part III also includes two essays on
methods for the study of phonology. In Chapter 19, Yvan Rose and Brian
MacWhinney deal with methodological issues involved in spoken data compilation
and analysis while they use the PhonBank project as an illustrative example.
Chapter 22 encourages a web-based method in the archiving and sharing of
speech corpora.

The last section, Part IV, ‘Corpora’, consists of a varied selection of
corpora. There are corpora of many different languages such as English (IViE
corpus in Chapter 23) French (the PFC programme in Ch. 24 and VALIBEL in Ch.
30), Norwegian (NoTa-Oslo and TAUS in Ch. 25), German (LeaP corpus in Ch. 26),
Danish (LANCHART corpus in Ch. 28), Dutch (Ch. 29) and Taiwanese (Ch. 32); and
of different dialects (Tyneside and Australian English in Chs. 27 and 31,
respectively). Furthermore, Part IV covers both segmental and suprasegmental
phonology and different areas and subareas of linguistics, for instance,
sociolinguistics, dialectology, first and second language acquisition and
historical linguistics.

Regarding the audience this book is intended for, the editors aim to reach a
wide audience including researchers from many different fields that go from
the most obvious, phonetics and phonology, to others that may not be as
evident, such as language variation, second language acquisition,
sociolinguistics and dialectology.

EVALUATION

The Oxford Handbook of Corpus Phonology is the first of its kind, since no
other handbook has yet been published that specifically deals with speech
corpora and, in this respect, it deserves our congratulations. This may be due
to the fact that it was only in the 1980s and 1990s when corpora started to
develop as “tools for the linguist or applied linguist” (O’Keeffe & McCarthy,
2010, p. 5). Since then, several books have been published on corpus
linguistics, for instance, The Routledge Handbook of Corpus Linguistics and
Corpus Linguistics: An International Handbook. Nonetheless, most of these
publications serve as general introductions to corpus linguistics and, even
though they usually discuss a wide range of topics, few, if any, of those
topics are analysed in depth. On the contrary, The Oxford Handbook of Corpus
Phonology exhaustively addresses the application of corpus-based methods to
the field of phonology.   

As mentioned earlier, the main goal that the editors of this volume set out to
achieve is to fuel the development of “international standards for the
compilation, annotation, and analysis of phonological corpora” (1). When it
comes to deciding whether they have been successful in their task, it seems
that they partially have. While in some chapters standardization is
deliberately considered (e.g. Ch. 9), most of them do not comment upon the
issue. However, many researchers in this book are concerned with
compatibility, interoperability, sharing, open access and sustainability and
all of these concepts are very much related to standardization and contribute
to its advancement. In fact, this connection is observed by Gut and Voormann
(2017) when they say that “[t]he issues of standardization and documentation
also apply to the sharing and reuse of phonological corpora” (p. 18).
Moreover, it can be noticed that, when standardization is dealt with, the
focus is on formats. An example of this is found in Chapter 9 where Romary and
Witt make an excellent point: “In this chapter we hope to have conveyed the
message that, within what could appear as an intricate jungle of standards, it
is possible to identify some baseline formats, allowing one to start putting
together a corpus project within some stable normative environments such as
the TEI” (p. 189). Apart from that, in the chapter on ELAN, its author calls
attention to the fact that “[t]here are ongoing efforts to establish a widely
accepted interchange format for multimodal annotation” (p. 319). 

All in all, it can be said that, even though there should have probably been
more discussion of the topic of standardisation, the book succeeds in
conveying the powerful message that the sharing of data is indispensable for
corpus phonology to move forward. Another reason for supporting data sharing
is that, as Yvan Rose beautifully emphasises, “[s]cientific competition should
be about ideas, not data” (p. 274).

With regard to the structure of the handbook, it could have been better
organised. On the one hand, the overall structure can be improved by placing
Part II ‘Applications’ at the end. The resulting structure, i.e., Part I
‘Phonological Corpora: Design, Compilation, and Exploitation’, Part II ‘Tools
and Methods’, Part III ‘Corpora’ and Part IV ‘Applications’, seems more
coherent since it follows the very process of corpus compilation and use:
first, the corpus is designed; second, some tools and methods are chosen;
third, the corpus is built; and fourth, the corpus is used for a specific
purpose. On the other hand, there is a lack of consistency in the structure of
the chapters. While some of them have an introduction, a conclusion,
references and acknowledgements, many chapters lack one or more of these
sections. Nevertheless, one can say in the editors’ favour that consistency is
not easily attained when many contributors are involved in the writing of a
book. 

Despite the need to improve consistency, chapters are intertwined and all
contribute to a comprehensive whole. Chapters fit within the frame of the
handbook and some contributors explicitly state it as in Chapter 11 “[o]ur aim
within the framework of this handbook is to explore…” (p. 214) and Chapter 20
“[f]ollowing the focus of this book, this chapter foregrounds the use of
EXMARaLDA for corpus phonology…” (p. 402). Apart from that, the handbook is
full of cross references since some tools, methods or corpora are mentioned in
several chapters. Cross referencing adds to the cohesion of the book but also
makes it sound a bit repetitive sometimes.

Finally, it is necessary to comment on another positive aspect of The Oxford
Handbook of Corpus Phonology. It encourages future research and opens up new
horizons for the field of corpus phonology. In fact, some chapters include a
final section with titles such as ‘Future outlook’ or ‘Future development’. In
one of these sections, Yvan Rose points out “the need to develop our research
methodologies and tools supporting them in a collegial way” in order to make
open scientific standards reliable (p. 285). 

A future edition of this handbook should take into consideration the
suggestions made in this review since they can be very beneficial not only for
the editors but also, and especially, for potential readers.

REFERENCES

Lüdeling, A., & Kytö, M. (2008). Corpus Linguistics: An International
Handbook. Walter de Gruyter GmbH.

O'Keeffe, A., & McCarthy, M. (Eds.). (2010). The Routledge Handbook of Corpus
Linguistics. Routledge.

ABOUT THE REVIEWER

Sara Díaz Sierra has a BA in English Studies from the University of
Extremadura (Spain) and a Master's degree in Advanced English Studies from the
University of Salamanca (Spain). She is currently doing a PhD on Northern
Irish English accent under the supervision of Professor Carolina Amador
Moreno, lecturer at the University of Extremadura. Her main interests are in
sociolinguistics, phonetics and phonology, corpus linguistics and
dialectology.

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-280	
----------------------------------------------------------