19.1853, Qs: Hierarchy of Variation in Natural Speech

Wed Jun 11 16:02:21 UTC 2008

LINGUIST List: Vol-19-1853. Wed Jun 11 2008. ISSN: 1068 - 4875.

Subject: 19.1853, Qs: Hierarchy of Variation in Natural Speech

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Catherine Adams <catherin at linguistlist.org>

We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it
is usually a good idea to personally thank those individuals who have
taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at


Date: 11-Jun-2008
From: Cassie Mayo < catherin at ling.ed.ac.uk >
Subject: Hierarchy of Hierarchy in Natural Speech


-------------------------Message 1 ---------------------------------- 
Date: Wed, 11 Jun 2008 12:00:51
From: Cassie Mayo [catherin at ling.ed.ac.uk]
Subject: Hierarchy of Hierarchy in Natural Speech
E-mail this message to a friend:

I'm working on a project looking at the process of subjective evaluation of
speech synthesis (that is, we're not evaluating, but rather determining
what listeners do when they evaluate).  We have found (probably
unsurprisingly) that the acoustic information that listeners are influenced
by in judging something like ''naturalness'' of synthetic speech falls into
a hierarchy -- listeners are more influenced by some sorts of information
than others.  In very general terms, the hierarchy seems to be: Presence of
artifacts (due to join discontinuities, etc) has more influence than
segmental quality which has more influence than intonation appropriateness.

Intuitively, this looks to me like the opposite of what would be considered
to be acceptable variation in natural speech, that is, listeners will
accept a great deal of variation in intonation, somewhat less variation in
segmental quality,
and much less (no?) variation in terms of presence of artifacts (pops and
clicks, rather than repairs and restarts).  

Has anyone come across any references that might support this intuition? 

Linguistic Field(s): Computational Linguistics
                     Forensic Linguistics

LINGUIST List: Vol-19-1853	


More information about the Linguist mailing list