25.3270, Qs: Inherent Vowel Quality and Perception of Stress
The LINGUIST List
linguist at linguistlist.org
Thu Aug 14 04:30:14 UTC 2014
LINGUIST List: Vol-25-3270. Thu Aug 14 2014. ISSN: 1069 - 4875.
Subject: 25.3270, Qs: Inherent Vowel Quality and Perception of Stress
Moderators: Damir Cavar, Indiana U <damir at linguistlist.org>
Malgorzata E. Cavar, Indiana U <gosia at linguistlist.org>
Reviews: reviews at linguistlist.org
Anthony Aristar <aristar at linguistlist.org>
Helen Aristar-Dry <hdry at linguistlist.org>
Mateja Schuck, U of Wisconsin Madison
Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.
Editor for this issue: Anna White <awhite at linguistlist.org>
Date: Thu, 14 Aug 2014 00:30:02
From: Magdalene Jacobs [magdalene.jacobs at gmail.com]
Subject: Inherent Vowel Quality and Perception of Stress
E-mail this message to a friend:
I have a question about how the perception of syllable stress may be affected
by inherent vowel quality. I would really appreciate it if anyone could share
their thoughts on my questions and/or direct me to some good literature on
this topic! I am a graduate student in Communication Disorders and do not have
a person with an advanced knowledge of phonetics that I can consult in my
department. First, I will just say what I found in my experiment, and then
give some more description.
My experiment was on word segmentation by English-speaking adults. Adults were
asked to listen to a nonsense language and then judge which of two trisyllabic
“words” sounded more like a word from the language. One of the trisyllabic
words was a “real word” in the language and the other word was a possible
misparsing of the speech stream. I used natural language stimuli.
I found that, after a 20 min exposure to a nonsense language, adults were more
likely to judge that a set of three syllables was a “word” if that word began
with a syllable containing a high vowel (e.g. “pi”) than if a “word” began
with a low or mid vowel (e.g. ''ta'' or ''bo''). These results were
So, I am wondering why this might be the case. Here is some background:
For this experiment, I needed to create a ''monotonous'' speech stream for an
artificial language in which rhythm could not be a cue to word boundaries.
I wanted to use natural syllables for this experiment—rather than synthetic
syllables. My natural syllables were all CV. The speech stream went something
like this: “ta-di-ke # bo-du-ka # to-pi-ga # etc (where # indicates a word
boundary). There were no pauses between “words” in the speech stream. The
syllables were read individually by a female speaker and then strung together
in MATLAB to create the speech stream.
I had a very hard time creating this language using natural tokens. To my
ear—and to the ears of my pilot subjects—it sounded as if word boundaries
occurred with syllables beginning with high vowels, (e.g. “di” sounded like it
was the beginning of a word, as did “du” and “pi,” etc.)
I thought that this perception of these syllables as being the beginning of
words might be due to the inherently higher pitch of these syllables, relative
to the other syllables (such as “ta” or “bo”). Because of this, I re-selected
tokens from my recordings and tried to match them as closely as possible in
terms of pitch.
As per other experiments in this same area (word segmentation), I then made
small adjustments to volume and duration (again, in order to create
“monotonous speech). I ended up mostly equalizing the intensity of each vowel
(64-66 dbs), as well as the duration (.30 s), as well as lowering the pitch of
the high vowels and raising the pitch of the low vowels. After modification,
the pitch range for the vowels was 177 Hz (for “to”) to 184 Hz (for “pu”).
I then ran more subjects using this language. I found that my subjects were
still significantly more likely to judge that words that began with a high
vowel were more like a “word” from the language they were exposed to than
words that began with a mid or low vowel. Specifically, if asked to judge
either “pi-ga-to” and “ga-to-pi” as a word, subjects were statistically more
likely to choose the first. These results were quite robust.
Additionally, from my own perspective, I always heard the high vowel syllables
as “more stressed” when I listened to the speech stream. What is perplexing to
me is that, even though the pitch range was quite small between the high and
low vowels (after manipulation), the high vowels still sounded more prominent
in the speech stream.
My preliminary idea regarding these results is that, by equalizing the volume
of the vowels, I may have made the high vowels sound more prominent, as they
should be inherently less loud than the low vowels.
So here are my questions:
I am wondering if anyone could point me to some work on the perception of
stress in English--particularly work that addresses:
1) Whether, in the absence of other cues, inherent pitch will make a syllable
sound more prominent
2) Whether the interplay between acoustic correlates of stress (pitch,
duration, loudness) is such that, by bringing vowels closer together in
acoustic space in terms of both pitch and volume, high vowels may have been
perceived to be more “stressed” because they are inherently quieter than low
I've tried to explore the literature on my own and have found myself a little
bit lost, since I am not a phonetician. I would very much appreciate any
advice on this matter!
Thank you very much!
Linguistic Field(s): Phonetics
LINGUIST List: Vol-25-3270
More information about the Linguist