34.743, Review: Linguistic Theories, Phonetics, Phonology: Barnes, Shattuck-Hufnagel (2022)

The LINGUIST List linguist at listserv.linguistlist.org
Fri Mar 3 18:49:43 UTC 2023


LINGUIST List: Vol-34-743. Fri Mar 03 2023. ISSN: 1069 - 4875.

Subject: 34.743, Review: Linguistic Theories, Phonetics, Phonology: Barnes, Shattuck-Hufnagel (2022)

Moderator: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Sarah Robinson,
      Joshua Sims, Jeremy Coburn, Daniel Swanson, Matthew Fort,
      Maria Lucero Guillen Puon, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Hosted by Indiana University

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Maria Lucero Guillen Puon <luceroguillen at linguistlist.org>
================================================================


Date: Fri, 03 Mar 2023 18:49:27
From: Wenxi Fei [richabufei at gmail.com]
Subject: Prosodic Theory and Practice

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36863558


Book announced at http://linguistlist.org/issues/33/33-2324.html

EDITOR: Jonathan  Barnes
EDITOR: Stefanie  Shattuck-Hufnagel
TITLE: Prosodic Theory and Practice
PUBLISHER: MIT Press
YEAR: 2022

REVIEWER: Wenxi Fei

SUMMARY
In the field of speech prosody, there are many debates about different
theories using similar terms but in subtly different ways. This book aims to
provide an overview of the major theoretical models for those who might want
to compare and apply those prosodic models. The chapters introduce the
prosodic models with comparisons or arguments one by one. In each chapter, the
authors answer several common questions, covering six sub-topics: phonology,
meaning, phonetics, typology, and psychological status, in order to enable
better comparisons among those models and to address what each approach does
best. 

The book starts with one of the most well-known models. Chapter 1 is written
by Amalia Arvaniti and is called The Autosegmental-Metrical Model of
Intonational Phonology. The Autosegmental-Metrical Model of Intonational
Phonology (henceforth AM) was based on Janet Pierrehumbert's (1980)
dissertation, and the term autosegmental-metrical was proposed by Ladd (1996).
It demonstrates the interaction of two phonological subsystems of intonation,
an autosegmental tier reflecting the melodic component of intonation and a
metrical structure representing prominence and phrasing. This model highlights
a principled distinction between the phonology of intonation and its phonetic
realization, which allows it to be adapted to studies of prosodically varied
languages and their perception. This chapter summarizes the AM phonology, its
phonetic implementation, intonation and typology, and applications of AM in
speech processing: Tones and Break Indices (ToBI) Systems and Transcription.
However, the AM theory was criticized for the reproduction of pitch contours
by other models, such as Xu's PENTA (e.g., Xu 2005) and the Fujisaki model
(e.g., Fujisaki 2004). Chapter 1 is followed by a commentary by Martine Grice.
Both advantages and disadvantages of AM are considered. It is emphasized that
the main advantages are the separation of tone and text, the structure of
tiers, and complex intonational primitives, and that there is a problem
related to categorization. Finally, the author responds with two points
concerning the categorization problem. One is about prosodic typology, and the
other is about the treatment of variability in the study of intonation.

Chapter 2 is called Modeling Danish Intonation, by Nina Grønnum. It considers
a specific intonational model used in Danish. Starting from the introduction
of the intonation system of Danish, the chapter argues that there are three
prosodic components in the Danish sound system: stress, stød, and intonation,
and if vowel length is considered a syllable prosody, that would make a
fourth. The paper summarizes the results of acoustic analyses and a few
perceptual experiments carried out from the mid-1970s through the 1990s.
Before offering a detailed description, this model suggests that intonation
could be part of speakers' production or speakers' perception, input to
synthetic speech, or patterns for speech recognition. Then, this chapter
summarizes some features of Danish intonation, including acoustic patterns and
psychological reality. The last part discusses the application of the
description of intonation from read speech to spontaneous speech, indicating
that the model needs to adjust the smooth progression of the stressed
syllables in prosodic phrases and utterances, as well as the gradual reduction
of the associated F0 patterns. In short, more analyses of spontaneous speech
await future investigations.

Chapter 3 is A Multilevel, Multilingual Approach to the Annotation and
Representation of Speech Prosody, written by Daniel Hirst. This model
describes several levels of representation, from a functional annotation of
intonation at the most abstract level, via an underlying phonological
representation and a surface phonological representation, to a phonetic
representation, which is directly convertible into an acoustic signal. Daniel
Hirst was actually annotating different levels of representation, including
prosodic functions, prosodic structure (stress foot, tonal units, etc.),
phonetic representation, underlying phonological representation, and surface
phonological representation. The surface phonological representation of this
model is the International Transcription System for Intonation (INTSINT, Hirst
1987) for prosodic annotation. The aim was to provide a tool for the
systematic description of these intonation patterns, something along the lines
of a narrow transcription using the IPA. It is very different from the ToBI
system (Silverman et al. 1992) and presupposes that the inventory of
intonation patterns for the language being described has already been
established. The prosody editor based on this model is called ProZed (Hirst
2015), a tool allowing the researcher to experiment with different abstract
phonological models, providing an acoustic output with which, at least
informally, to evaluate the relative value of different models.

Chapter 4 is The ToBI Transcription System: Conventions, Strengths, and
Challenges, by Sun-Ah Jun. ToBI, standing for tones and break indices, and is
one of the most well-known annotation systems for prosody at the level of
phonology. It transcribes the phonological properties of intonation (tones)
and the perceived degree of juncture between words (break indices). Together,
these represent the prominence patterns and prosodic structure of an
utterance. This chapter describes the theoretical underpinnings of the ToBI
system, its conventions, how it functions, and its advantages and
disadvantages. It also introduces some of the recent developments in the
prosodic transcription system that attempt to address transcription
difficulties associated with ToBI's vulnerability to the variability and
gradience of tonal categories and to reflect the importance of rhythmic or
metrical prominence distinct from pitch accent. This chapter is commented on
by Laura C. Dilley and Mara Breen, who admit that the ToBI system is an
approach that has facilitated the discovery of important empirical insights
about the cross-linguistic structure of intonation. However, there are several
serious problems that exist with traditional AM theory, leading to limitations
of ToBI. Dilley and Breen also propose an enhanced AM theory (AM+) and the
rhythm and pitch (RaP) prosodic transcription system (Breen et al. 2012). The
chapter author, Jun, responds to their proposed model that although the AM+
theory made its phonological component very rich to solve the problem, it is
also not clear whether the AM+ model of English intonation can accommodate any
fixed inventory of tonal elements or the syntax of their combination.
Returning to ToBI, Jun argues that it was not designed to be a phonetically
transparent tool.

Chapter 5 is Prosody in Articulatory Phonology, by Jelena Krivokapić. This
approach was designed to overcome the problem of cognitive and physical
aspects of speech by viewing these two aspects of speech not just as
compatible but as macroscopic and microscopic aspects of the same
representation. The microscopic properties are the physical characteristics
(i.e., articulatory, acoustic) of the macroscopic cognitive units (i.e., the
combinatorial units of speech). The model uses two modes of coupling: in-phase
(simultaneous) and anti-phase (sequential) coupling. Based on this approach,
Temporal Modulation Gestures and the π-Gesture Model (Byrd & Saltzman 2003)
proposes that prosodic boundaries are viewed as prosodic gestures
(π-gestures), which is the cognitive representation of the prosodic boundary.
Additionally, Tone Gestures (Gao 2009) further extend the gestural approach to
lexical tones. Lastly, the author discusses two possible views of prosodic
structure: the prosodic hierarchy can be understood as arising from a
hierarchy of nested planning oscillators or through the coordination of the
π-gesture, other prosodic gestures, and constriction gestures. Alice E. Turk's
comments focus on one particular aspect of the theory, that is, the coupled
oscillator approach of Articulatory Phonology to polysyllabic shortening.
There are also some additional comments on the overall approach to
general-purpose timekeeping mechanisms.

Chapter 6 is about The Trouble with ToBI, written by D. R. Ladd. This chapter
explores some of the important theoretical issues within the general AM
approach. Firstly, on the topic of phonological distinctions and phonetic
categories in the AM approach, Ladd raises a practical issue that different
varieties of the same language often have slightly different systems of
phonemic contrast and another problem of salient subphonemic detail in
connected speech. Secondly, he argues that an agreed standard based on one
specific autosegmental analysis of one specific variety of English has impeded
the full discussion of important theoretical issues that arise from the
general AM approach. Lastly, it remains unclear which phonetic features of
intonation are gradient and which are categorical.

Chapter 7 is The Prosogram Model for Pitch Stylization and Its Applications in
Intonation Transcription, by Piet Mertens. This chapter is about a generic,
integrated approach, covering the acoustic, perceptual, and linguistic
manifestations, as well as the relationships and mapping among them. The
analysis proceeds bottom-up, starting from acoustic parameters, over pitch
stylization simulating tonal perception via the labelling of pitch levels and
movements using discrete symbols so as ultimately to obtain a structured
representation of intonation. The result is that pitch events are aligned with
prosodic structure, forming prosodic units, with their internally structured
pitch contours. This model assumes that tonal perception is important for both
pitch stylization and symbolic transcription. Thus, the chapter discusses how
sound signals and pitch gestures are affected by tonal perception and the
prosodic features obtained during pitch stylization, such as individual
syllables and the sequences of syllables. The last part of this chapter is
about automatic symbolic transcription of both pitch level and intonation
level.

Chapter 8 is called The Kiel Intonation Model—KIM, by Oliver Niebuhr. The
model discussed in this chapter is a contour model for Northern Standard
German by Klaus Kohler and his colleagues (e.g., Kohler 1991). The ultimate
goal of the KIM is to determine all communicative functions of Standard German
prosody and relate them to distinctive prosodic features derived from detailed
descriptions and their contextual as well as individual variation. Therefore,
the KIM is obviously most suitable for German data. Nonetheless, a number of
phonological distinctions have recently also been found in a range of other
languages, such as English, Swedish, and Estonian, suggesting that the KIM is
potentially also applicable to a wider group of languages. An inventory for
prosodic labelling (PROLAB) was attached to the KIM, which translates the
phonological categories of intonation, phrasing, prominence, and emphasis into
sequences of simple ASCII letters. These symbols are annotated on a single
tier with a conventionalized syntactic structure, but there are tools that can
automatically convert these annotations into multiple-tier representations.
These tools ensure that PROLAB annotations are largely interchangeable across
different software and transcription systems such as Praat. On the practical
side, the KIM's methodological guidelines are provided. On the theoretical
side, the phonological elements of this model are presented by the author, and
some perception experiments are introduced to support the theory. Finally, the
author points out some potential problems of this model and some prospects for
further development.

Chapter 9 is about The Rise and Fall of the British School of Intonation
Analysis, by Francis Nolan. This approach is a pedagogical one that is
different from the others in this book. This chapter starts with the
development and history of the British school of intonation analysis. It
highlights the fact that intonation functions both within the formal
linguistic system and outside it, and thus the British School approach
appeared. For its characteristics, the British school of intonation analysis
relies mainly on auditory phonetic analysis of the patterns and introspection
about their meaning. One of its defining characteristics is the use of dynamic
pitch elements such as the rise and fall-rise, as opposed to pitch levels.
Nevertheless, its name reflects its limitations: although contributions have
been made by scholars around the world, much of the development of the
analyses that characterize the school can be attributed to phoneticians
working in the United Kingdom. The chapter also compares the British School
approach with the AM theory. For similarities, they share an essentially
phonological approach, according to factors in the context in which they
occur. For differences, the British school uses abstract pitch changes as its
prime, while the AM uses abstract pitch levels or targets.

Chapter 10 is The PaIntE Model of Intonation, by Antje Schweitzer, Bernd
Möbius, Gregor Möhler, and Grzegorz Dogil. The parameterized intonation events
model (PaIntE: Möhler and Conkie 1998) was originally developed for F0
modelling in text-to-speech (TTS) synthesis. The PaIntE model assumes that
only the F0 contour in the vicinity of so-called intonation events contributes
to the intonational meaning of an utterance, whereas the stretches in between
these events arise from interpolation and do not affect the overall meaning.
PaIntE can also be used as a sequential model of intonation, in that it
composes the F0 contour from a sequence of local contours, each associated
with some kind of meaningful tonal event, and these events or local contours
do not interact or affect each other. The authors introduce the PaIntE model
and its parameters, highlighting how it can be applied in speech synthesis,
how to relate PaIntE to prosodic categories, and how PaIntE works in
intonation research as one of the exemplar-theoretic approaches. To emphasize
its advantage, the PaIntE model is beneficial for labelling intonation
categories with faster and more consistent automatic labelling of those
categories because it is able to approximate the shapes of natural F0 curves
in an analysis mode and to generate F0 contours that sound convincingly like
natural ones in a synthesis mode. Lastly, the authors also present the
motivation behind the PaIntE modelling approach and its mathematical
formulation.

The last chapter, Chapter 11 by Yi Xu, Santitham Prom-on, and Fang Liu,
introduces The PENTA Model: Concepts, Use, and Implications. The authors
regard speech as a communication system and take the articulatory-functional
view of speech that forms the basis of the parallel encoding and target
approximation (PENTA) model (Xu 2005). In other words, PENTA is a theory of
how multiple layers of information are conveyed through prosody with a
controlled biomechanical system. It also treats how prosody works as a
communication system, how it can be learned, and how it goes through changes
over time. Differing from many other theories focusing on prosodic forms,
PENTA accounts for prosodic forms only as a by-product. The authors start from
the conceptual framework. The TA represents the syllable-synchronized
sequential target approximation, and PE is the prosody that conveys multiple
layers of information simultaneously. Apart from its theoretical implications,
PENTA is used as a major research tool for computational modelling. Thus, the
authors believe that PENTA has broader significance with regard to how prosody
operates as part of the speech communication process for computational
modelling, articulation, data-driven phonology, prosodic typology, and
perceptual modelling. Janet Pierrehumbert adds comments that regard PENTA as a
3rd generation model. However, she questions whether PENTA supersedes the AM
approach and to what extent it builds on insights from the earlier approach by
comparing AM and PENTA. The authors also give their response to Pierrehumbert
that, compared with AM, PENTA is not a direct mapping model. Instead, it
focuses on speech functions first. Thus, they suggest that PENTA is one of the
most indirect models of prosody, as it explicates multiple degrees of
separation between meaning and continuous surface prosody.

EVALUATION
To summarize, this book hosts a formalized platform for discussion about
prosodic models and annotation toolkits. In other words, it covers prosodic
problems both theoretically and practically. Apart from introducing
phonological theories, it describes and criticizes the most popular ones with
comparisons. This book also highlights the practical perspective of prosody
about speech modelling. It seeks to address the following questions: 1) Which
model of prosody is seen as ''correct'' or ''best'' for practice?, 2) Which
one of those models is better and why?, and 3) How can those models be
applied? Overall, the book suggests that those questions depend on what the
user wants to do with them, and no single model answers all possible needs,
which responds to its practical aim. Additionally, the chapters all serve the
purpose of discussion, which is in line with the introduction to this book.
Specifically, some chapters are followed by commentaries and authors'
responses, allowing readers to reach a critical view of those theories.

The chapters in this book are largely coherent because the editors have set
six common questions for the author(s) of each chapter to answer. In the
introductory chapter, the six questions are meant to disclose the objectives
of each of the primary approaches to prosody that have been put forward over
the past few decades. By showing the similarities and differences between each
theory's goals, the chapters, with their common questions, make it simpler for
practitioners to choose the strategy that best fits their requirements.
However, the sequence of some chapters can be changed to make them more
coherent. For example, Chapter 4 introduces and discusses the ToBI system,
while Chapter 6 is about its problems as well. Thus, they could be placed
together for a more coherent discussion of ToBI.

Personally speaking, this book is a useful handbook about speech (and prosody)
technology, phonological theories, and speech perception. It is designed for
phoneticians and technicians who want to have an overview of different
prosodic theories and select the most suitable ones for their practice.
Nevertheless, I do not think this is a typical introductory book. It requires
readers to have some basic knowledge of phonology, phonetics, speech
processing, and prosodic models, although the volume editors asked the authors
to write for relative newcomers to the field of prosody.

Finally, as the editors and chapter authors suggest, this book also aims to
welcome both empirical and practical evidence from experiments and modelling
practice for supporting or arguing for the selected models.

REFERENCES
Breen, M., L. C. Dilley, J. Kraemer, and E. Gibson. 2012. Inter-Transcriber
Reliability for Two Systems of Prosodic Annotation: ToBI (Tones and Break
Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory
8. 277–312. 

Byrd, D., and E. Saltzman. 2003. The Elastic Phrase: Dynamics of
Boundary-Adjacent Lengthening. Journal of Phonetics 31. 149–180. 

Fujisaki, Hiroya. 2004. Information, Prosody and Modeling with Emphasis on
Tonal Features of Speech. In Proceedings of Speech Prosody, Nara, Japan, 2004,
edited by Bernard Bel and Isabelle Marlien; ISCA Archive.
http://www.isca-speech.org/archive/sp2004. 
                                         
Gao, M. 2009. Gestural Coordination among Vowel, Consonant and Tone Gestures
in Mandarin Chinese. Chinese Journal of Phonetics 2. 43. 

Hirst, D. J. 1987. La Représentation Linguistique des Systèmes Prosodiques:
Une Approche Cognitive (Habilitation Thesis). Université de Provence.
                                         
Hirst, D. J. 2005. Form and Function in the Representation of Speech Prosody.
Speech Communication 46. 334–347. 
                                         
Kohler, K. J. 1991. Prosody in Speech Synthesis: The Interplay between Basic
Research and TTS Application. Journal of Phonetics 19. 121–138. 

Ladd, D. R. 1996. Intonational Phonology. Cambridge: Cambridge University
Press. 
                                         
Möhler, G., and A. Conkie. 1998. Parametric Modeling of Intonation Using
Vector Quantization. In Proceedings of the Third ESCA/COCOSDA Workshop on
Speech Synthesis. 311–316. https://isca-speech.org/archive_open/ssw3/. 

Pierrehumbert, J. B. 1980. The phonology and phonetics of English intonation
(Doctoral dissertation). Massachusetts Institute of Technology.

Silverman, K., M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price,
J. Pierrehumbert, and J. Hirschberg. 1992. TOBI: A Standard for Labeling
English Prosody. In Proceedings of the Second International Conference on
Spoken Language Processing. 867–870. Banff, Canada: ISCA. 

Xu, Yi. 2005. Speech Melody as Articulatorily Implemented Communicative
Functions. Speech Communication 46(3–4). 220–251.


ABOUT THE REVIEWER

Wenxi Fei is a research assistant at the Speech and Language Sciences Lab of
the Hong Kong Polytechnic University. Before that, she graduated from
University College London majoring in Language Sciences with Distinction, and
from The Education University of Hong Kong with the president's honours list.
Her research interests are Prosodic Perception and Production by applying
interdisciplinary methods, including acoustic analysis, behavioural measures
and brain-imaging techniques. She has published some work on how bilinguals
perceive and produce prosody.





------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-34-743	
----------------------------------------------------------





More information about the LINGUIST mailing list