8.954, Review: Liberman: Speech- A special Code

linguist at linguistlist.org linguist at linguistlist.org
Sat Jun 28 05:21:42 UTC 1997


LINGUIST List:  Vol-8-954. Sat Jun 28 1997. ISSN: 1068-4875.

Subject: 8.954, Review: Liberman: Speech- A special Code

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Andrew Carnie <carnie at linguistlist.org>
 ==========================================================================

What follows is another discussion note contributed to our Book Discussion
Forum.  We expect these discussions to be informal and interactive; and
the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for discussion."  (This means that
the publisher has sent us a review copy.)  Then contact Andrew Carnie at
     carnie at linguistlist.org

 ==========================================================================




Liberman, Alvin M. (1996). Speech: A special code. Learning, Development,
and Conceptual Change series. Cambridge, MA: MIT Press. 458 pages. ISBN
0-262-12192-1.

Reviewed by Stefan A Frisch <safrisch at indiana.edu>


This book is a collection of articles representing 50 years of speech
research at Haskins Laboratories undertaken by Alvin Liberman and
colleagues. It is divided into ten sections, which cover a range of
issues which have guided Liberman's research and theorizing on the
process of speech perception.


Synopsis:

Chapter 1.

The first chapter is a new article which reviews the historical progress
of Liberman's research and the development of the Motor Theory of speech
perception.  This chapter is both introduction and conclusion to the
compilation of articles in the remainder of the book.  In it, Liberman
sets forth quite clearly two different views of speech perception: the
"horizontal" view and the "vertical" view. The horizontal view, which
Liberman equates with an auditory view, is that speech perception engages
no special mechanisms unique to speech. The vertical view, which he
supports, is that there is a speech perception module which operates
independently from the process of general auditory perception.  He also
associates the vertical view with the hypothesis that speech perception
involves the hearer perceiving the articulatory gestures made by the
speaker directly and not the acoustic result of those gestures.  This
chapter is extremely valuable both to the specialist and the novice, as a
unifying work that brings together an entire research program in a single
place.

Part 1. On the Spectrogram as a Visible Display of Speech

This section contains one article that is a brief description of an early
attempt to convert between visual and auditory stimuli using the
spectrograph and pattern playback on speech and simple geometric shapes.
It reveals quite clearly Liberman's early horizontal view that there are
modality independent properties of pattern perception that would apply
equally well to speech as any other pattern, no matter how artificial.

Part 2. Finding the Cues

There are seven articles in this section, detailing experimental work on
the auditory cues for phoneme identification in English.  One of the
major findings of these experiments is that there is no unitary acoustic
invariant for a phoneme which corresponds to the unitary perceptual
experience of the listener.  In addition, there are discontinuities in
the acoustic categories for different phonemes.  These chapters are also
quite useful in that they detail the basic acoustic properties of a
variety of English phonemes.  The topics covered include the consonant
release burst of word initial stops as place cues, the direction and
duration of CV formant transitions as cues for place and manner, the
abrupt first formant onset as a cue for voiceless stops, and a summary
article which contains an early description of the rules necessary to
synthesize phonemically contrastive English.

Part 3. Categorical Perception

One of the articles in this section, "The Discrimination of Speech Sounds
within and across Phoneme Boundaries", should be read by every student of
speech.  The authors compared the ability of subjects to label
synthesized syllables (containing onsets on a acoustic continuum of stop
place of articulation from "bay" to "day" to "gay") with there ability to
discriminate these same syllables. They found acute discriminability
between phoneme categories and poor discriminability within categories, a
pattern which has come to be called "categorical perception".  The
analytical techniques and conclusions made in this article gave rise to
what is now a large literature on the categorical perception of speech
and non-speech by both humans and animals. The second article in this
section details an attempt to compare speech with an equivalent
non-speech control on the categorical perception of intervocalic stop
closure duration which can be used to distinguish "rapid" from "rabid".

Part 4. An Early Attempt to Put It All Together

This section contains the article "Some Results of Research on Speech
Perception" which presents what Liberman now calls the Early Motor
Theory.  In this model, the objects of speech perception are the
articulations that create the acoustic patterns, which Liberman assumes
make more coherent categories than the acoustic patterns.

Part 5. A Mid-Course Correction

The article in this section, "Perception of the Speech Code", is a review
from ten years after the previous section.  This article contains two
major changes from the Early Motor Theory.  The revised theory proposes
the direct perception of articulatory events, without an intermediate
auditory stage of processing.  It also argues for a special speech mode
in which this perception occurs, to account for differences between
speech and non-speech perception. In particular, experiments on duplex
perception show that dichotically presented parts of a syllable (e.g. an
ambiguous "base syllable" and a crucial formant transition that
differentiates [da] from [ga]) are unconsciously and uncontrollably fused
into a complete percept.  Also, conflicting auditory and visual
information are integrated to produce a single perception (e.g. seeing a
face produce [ba] while hearing [ga] results in a percept of [da]).

Part 6. The Revised Motor Theory

This section contains the 1985 article "The Motor Theory of Speech
Perception Revised".  This is another article which everyone with an
interest in speech should read.  This article is valuable in setting the
Motor Theory apart from the more general theories of ecological
psychology.  This article also places the Motor Theory in the context of
Fodor's writings on modularity.  When compared to the previous two
sections, it is most fascinating to see how the rest of the field had
developed and "caught up", giving Liberman something more concrete to
which to compare the Motor Theory.  In this incarnation, the percepts of
speech are the intended articulatory gestures of the speaker, which are
perceived by a biologically specialized module.

Part 7. Some Properties of the Phonetic Module

The article in this section places the speech perception module of the
Motor Theory in the context of other modules of perception and
communication.  Like other perceptual modules, the phonetic module
preemptively processes stimuli, so that speech is not ordinarily
perceived both as speech and as a collection of non-speech noises.  Also,
the objects of speech perception (the gestures) are radically different
from the stimulus (the signal).  This is much like the perception of
three-dimensional depth, for example, from the integration of two
two-dimensional retinal images.

Part 8. More about the Function and Properties of the Phonetic Module

The article in this section further discusses the modularity of speech
perception, and also contributes to theories of modularity in general by
proposing a difference between "open" and "closed" modules, and
properties particular to each.  Perception of linguistic contrasts
utilizes a closed module with a discrete set of percepts.  Perception of
depth by stereoscopic vision utilizes an open module, that can perceive a
continuous range of depth.  The authors claim that when two modules
compete for the same stimulus, processing by the closed module occurs
before processing by the open module.

Part 9. Auditory vs. Phonetic Modes

This section contains seven articles which delve more deeply into the
difference between speech and general auditory perception, from a variety
of perspectives.  The topics include the perception of linguistic
categories from another language, trading relations between cues for a
phonemic contrast, and acoustically appropriate non-speech controls.

Part 10. Reading/Writing Are Hard Just Because Speaking/Listening Are Easy

The final article in the book argues that the horizontal view predicts
that reading and writing should be easier than speaking and listening.
The Motor Theory and the vertical view predict that speech is primary due
to the biological specialization of the speech perception module.


Critical evaluation:

There are three points highlighted in the more theoretical chapters of
Liberman's book that I would like to address.  First, he proposes there
is a speech perception module, biologically specialized to process
speech.  Second, he proposes that the percepts of speech are not
auditory, but rather that they are articulatory.  Third, he argues that
were these not the case, we would expect reading and writing to be easier
than speaking and listening, when in fact, speaking and listening are
easier.  I consider each point in turn.

A variety of experiments showing that speech is processed differently
from non-speech provide evidence for a specialized speech perception
module.  However, it is uncertain whether these experiments consider
appropriate non-speech controls to compare to speech.  While a number of
ways of creating complex signals which are more or less acoustically
equivalent to speech are considered, these experiments do not explore
whether there are controls which are communicatively or informationally
equivalent to speech.  Fowler & Rosenblum (1990) found that a natural
sound, the sound of a door slamming, patterned more like speech, and
differently from laboratory generated non-speech controls (which are
artificial sound patterns). A door slam is ecologically relevant, as it
gives the hearer information about an action which occurred in the
world.  Speech has tremendous social significance and is probably the
most highly practiced complex perceptual task performed by humans.  These
factors have not been adequately considered when explaining differences
between speech and non-speech perception.  While it may be the case that
speech is processed by a special mechanism, we cannot exclude the
possibility that this mechanism also processes some types of non-speech
sounds.

A second claim of the Motor Theory of speech perception is that the
percepts of speech are not the acoustic signals which impinge directly
upon the ear, but rather that the percepts are the distal articulations
made by the speaker.  One of the Liberman's first findings was that there
is no acoustic invariant which corresponds to the perceptual invariant of
the phoneme or segment.  It is now well known, and admitted by Liberman,
that the articulatory gestures and even their motor commands are not
invariant either.  In the revised theory, the articulatory percepts are
assumed to be the speaker's intended gestures, before contextual
adjustments.  However, abstracting the percept to this degree undermines
the claim that the percepts are articulatory.  The percepts might as well
be entirely abstract phonemic categories.

Another more striking finding from Liberman's early experiments is that
there are discontinuities in the acoustic to phonemic mapping for onset
consonants.  These discontinuities were taken as additional evidence
against an acoustic basis for phoneme categories.  Other researchers have
found that for some phonemic categories the acoustic mapping is simple
while the articulatory mapping is complex.  For example, American English
/r/ can be produced with one or more of three distinct gestures, and
there is intraspeaker variation in which gestures are used (Delattre &
Freeman 1968; Hagiwara 1995; see also Johnson, Ladefoged, & Lindau
1993).  With neither acoustic nor articulatory categories providing
simple dimensions upon which to base the perceptual category, once again
we are led to more abstract invariant percepts.  The coherence as
categories of these abstractions can be based on either articulatory or
acoustic properties, or both.  This conclusion accords well with
linguistic theory, where abstract segments or phonemes are generally
accepted in some form, and where phonological processes exist which need
to be described both by articulatory features (Chomsky & Halle 1968,
Clements 1985) and by acoustic features (Jakobson, Fant, & Halle 1965;
Flemming 1995).

Finally, Liberman claims that, since the perceptive and productive
mechanisms for reading and writing, the eyes and hand, are much more
sensitive and agile than those for speech, reading and writing should be
simpler than speech.  He rightly points out that reading and writing must
be taught, and are learned only with difficulty by many, which suggests
that there is something special about speech.  Indeed, speech is special,
but it has cognitive and evolutionary advantages over reading and writing
which more than offset the other advantages of reading and writing.  For
example, the unfolding of a linguistic message in speech is naturally
determined by the flow of time, whereas writing is arbitrarily
directional so the direction of reading can be determined only by
convention.  Reading and writing also require the use of an additional
medium, such as paper or a patch of dirt, and so from an evolutionary
point of view reading and writing is at a disadvantage.  Rather than
reading and writing, we should consider sign language when looking for a
visual equivalent to speech.  In fact, sign is learned by deaf children
of signing parents just as easily and automatically as speech is learned
by hearing children of speaking parents, and some researchers believe
sign language does have an acquisition advantage (see Newport & Meier
1985, Meier & Newport 1990, Volterra & Iverson 1995 for discussion).
Sign languages are the equals of oral languages in linguistic complexity
and arbitrariness, and their existence shows that much of what is special
about speech does not depend specifically on the ear and vocal tract.

In summary, Liberman's articles provide strong evidence that speech is
special, and processed differently and preemptively by a mechanism that
has many of the properties of a modular system. However, much of what is
special in speech is also found in sign language and in other
ecologically relevant sounds.  Arguments for a biological specialization
for speech perception as articulatory are based on an overly restricted
range of evidence.  In the broader perspective, speech is special because
it is an integral part of natural language.  This book is an informative
and provocative study of that very important facet of language.


References:
Chomsky, N. & M. Halle (1968). The sound pattern of English. Cambridge,
MA: MIT Press.

Clements, N. (1985). The geometry of phonological features. Phonology
Yearbook 2: 225-252.

Delattre, P. & D. Freeman (1968). A dialect study of American r's by
X-ray motion picture. Linguistics 44: 29-68.

Flemming, E. (1995). Auditory representations in phonology. Unpublished
Ph.D. Thesis, UCLA.

Hagiwara, R. (1995). Acoustic realizations of American /r/ as produced by
women and men. Ph.D. Thesis, UCLA, published as UCLA Working Papers in
Phonetics 90.

Jakobson, R., G. Fant, & M. Halle (1952). Preliminaries to speech
analysis. Cambridge, MA: MIT Press.

Johnson, K., P. Ladefoged, & M. Lindau (1993). Individual differences in
vowel production.  Journal of the Acoustical Society of America 94(2):
701-714.

Meier, R. & E. Newport (1990). Out of the hands of babes: On a possible
sign advantage. Language 66(1): 1-23.

Newport, E. & R. Meier (1985). The acquisition of American Sign Language.
In D. Slobin (ed.), The crosslinguistic study of language acquisition,
volume 1: The data. Hillsdale, NJ: Lawrence Earlbaum. 881-938.

Volterra, V. & J. Iverson (1995). When do modality factors affect the
course of language acquisition?.  In K. Emmorey & J. Reilly (eds.),
Language, gesture, and space. Hillsdale, NJ: Lawrence Earlbaum. 371-390.


Reviewer:
Stefan Frisch, NIH Post-Doctoral Research Fellow, Speech Research
Laboratory, Indiana University. Ph.D. in Linguistics. Research interests
include phonetics, phonology, and psycholinguistics (a.k.a. laboratory
phonology) and the language/cognition interface.


Acknowledgment:
Thanks to David Pisoni, Sonya Sheffert, and Richard Wright for comments
and discussion of this work.


Reviewer's address:
Stefan Frisch
Speech Research Laboratory
Psychology Department
Indiana University
Bloomington, IN 47405
safrisch at indiana.edu
http://www.indiana.edu/~srlweb/staff/frisch.html



---------------------------------------------------------------------------
LINGUIST List: Vol-8-954



More information about the LINGUIST mailing list