17.549, Review: Forensic Ling/Phonetics: Alderman (2005)

Mon Feb 20 04:51:54 UTC 2006

LINGUIST List: Vol-17-549. Sun Feb 19 2006. ISSN: 1068 - 4875.

Subject: 17.549, Review: Forensic Ling/Phonetics: Alderman (2005)

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Lindsay Butler <lindsay at linguistlist.org>
================================================================  

What follows is a review or discussion note contributed to our 
Book Discussion Forum. We expect discussions to be informal and 
interactive; and the author of the book discussed is cordially 
invited to join in. If you are interested in leading a book 
discussion, look for books announced on LINGUIST as "available 
for review." Then contact Sheila Dooley at dooley at linguistlist.org. 

===========================Directory==============================  

1)
Date: 16-Feb-2006
From: David Deterding < dhdeter at nie.edu.sg >
Subject: Forensic Speaker Identification 

-------------------------Message 1 ---------------------------------- 
Date: Sun, 19 Feb 2006 23:49:37
From: David Deterding < dhdeter at nie.edu.sg >
Subject: Forensic Speaker Identification 

AUTHOR: Alderman, Tony Brian
TITLE: Forensic Speaker Identification
SUBTITLE: A Likelihood Ratio-based Approach Using Vowel Formants
PUBLISHER: Lincom GmbH
YEAR: 2005
Announced at http://linguistlist.org/issues/16/16-1983.html 

David Deterding, NIE/NTU, Singapore 

OVERVIEW

The ability to identify speakers based on samples of speech is 
becoming increasingly important in forensics, to enable us for example 
to determine who a telephoned bomb threat was made by and 
whether recorded kidnapping demands were made by a particular 
suspect or not. This book considers the effectiveness and reliability of 
various methods of identifying and discriminating between speakers 
based on measurements of the second and third formants (F2 and 
F3) of the five long monophthongs of eleven male Australians 
recorded on two separate occasions, particularly by considering the 
Likelihood Ratio (the ratio of the probability that the two speakers are 
the same divided by the probability that they are not the same).

In the following discussion, 
/i/ refers to the vowel in 'heed', 
/u/ to the vowel in 'who'd' (in the book, it is shown as a centralised 
vowel, a ''barred-u''), 
/o/ to the vowel in 'hoard', 
/a/ to that in 'hard' (shown as an open central vowel, a ''turned-a''), 
and 
/3/ to the vowel in 'herd' (a mid central vowel).

SYNOPSIS

Chapter 1 introduces the issues of forensic phonetics and then 
provides an overview of the book. In chapter 2, the basic principles of 
probabilistic forensic speaker identification are discussed and the use 
of formants for representing vowels is considered. Chapter 3 covers 
issues concerned with probability theory including the Bayesian 
approach to evaluating evidence and also the assessment of 
normality. The characteristics of the normal distribution are discussed 
in more detail in chapter 4, including statistical measures of deviation 
from normality such as skew and kurtosis (the degree to which data is 
clustered about the mean). In chapter 5, the vowels of Australian 
English are presented with particular reference to the Bernard data 
set of measurements of the vowels of 170 male speakers. The 
acoustic representation of the five long monophthongs of Australian 
English from the Bernard data set is evaluated in chapter 6, and then 
the recordings on which this study are based are described in chapter 
7. Chapter 8 then presents the results of the current study, comparing 
the relative success of F2 and F3 of each of the five long vowels in 
separating out the eleven speakers, and then chapter 9 considers the 
implications of the study and discusses the way forward.

CRITICAL EVALUATION

This is a short book packed with data. In fact, nearly half of the 143 
pages consist of the full tabulated measurements of F2 and F3 of the 
eleven speakers and also the results for the effectiveness of each of 
the parameters in discriminating between the speakers. While it is 
highly commendable that so much detailed information is provided, 
and indeed the lengthy tables do allow the reader to get a real feel of 
the data (and also to check all the results, should one choose to), it 
does sometimes get a bit overwhelming, particularly in chapter 8 when 
a comparison of the effectiveness of each of the parameters is 
presented first in isolation and then in various combinations.

Sometimes one wishes that more interpretation were provided. For 
example, on page 44 we find that four out of five of the vowels have a 
positively skewed distribution for F2. But why is this so? And what is it 
about /u/ that makes it different from the others? Then we are told that 
the F2 of /o/ is bimodal (pp. 45-6) for the Bernard data. Does this 
mean there are two different realisations of the vowel in Australia, one 
fully back and one less so? Or maybe there is some kind of instability 
in the measurement? We learn on page 60 that the F-ratio for the 
distribution of F2 for /o/ is low for the eleven speakers in this study, 
which indicates that the between-speaker variation is relatively small 
but the within-speaker variation is high for the F2 of this vowel. But 
why? Is it perhaps related to the bimodality of the F2 of /o/? In chapter 
8 (p. 66) we are shown that, when using the Aitken formula for 
estimating the Likelihood Ratio, a smoothing factor of 0.05 is best for 
the F2 of all the vowels except /u/ and a smoothing factor of 0.4 is 
best for the F3 of all the vowels except /u/. But what is it about /u/ that 
results in a need for the distribution of its F2 to be smoothed more 
than that of the other vowels while its F3 needs to smoothed less? All 
these questions seem to be crying out for further interpretation.

We might consider one aspect of the representation of the five vowels 
a bit further. On page 43, we are shown a scatter plot of F1 against 
F2 for the five vowels, and it appears that the range of F2 for /u/ is 
about the same as for /i/. But this is partly an artifact of the scales 
used: in percentage terms, a range of 1200 to 1800 Hz (for /u/) is in 
fact substantially larger than a range of about 2000 to 2600 Hz 
(for /i/). If, instead of linear Hertz scales, the plots were shown on 
auditory Bark scales (as is common in acoustic representations 
nowadays), the range of F2 for /u/ would be shown as larger than that 
for /i/, and this might more accurately reflect the fact that there is 
indeed substantial variation in the degree of fronting for /u/ in many 
varieties of English, including Australian.

One further issue arises with regard to the data. A substantial quantity 
of speech was recorded: 2 recordings on different occasions of 4 
repetitions of 24 sentences, so the research is based on an 
impressively large set of vowel measurements. However, it is not clear 
why for two of the vowels the words were kept consistent, with a fixed 
hVd frame, but for each of the other three vowels, another 
phonological frame was included: for /i/, 'deed' was used in addition to 
three instances of 'heed'; for /a/, 'card' occurs in addition to three 
instances of 'hard'; and for /o/, 'board' was recorded in addition to 
three instances of 'hoard'. Does this not mean that the influence of the 
initial consonant might have increased the variation for /i, a, o/ 
compared to the other two vowels? 

In fact, one might also question whether the general reliance on a 
fixed 'hVd' word shape might not substantially underestimate the 
degree of variation that exists in the vowels that occur in real speech 
data, for the degree of coarticulation from initial and final consonants 
may actually be quite significant.

Nevertheless, this book does present some fascinating and 
exceptionally valuable results in an important area of research. The 
data is carefully presented even if the interpretation might have been 
elaborated a little, the foundations for the research are well-grounded 
even if there are one or two questions one might ask about the data, 
and many people working in this field of trying to establish the 
theoretical and practical foundations of forensic speaker identification 
will find the thoughtful consideration of so many statistical issues very 
useful. This book undoubtedly makes a significant and important 
contribution to the growing body of work on forensic phonetics, and 
indeed many linguists who are interested in how vowels should be 
represented will also find it informative and interesting. 

ABOUT THE REVIEWER

David Deterding is an Associate Professor at NIE/NTU, Singapore, 
where he teaches phonetics, phonology, syntax, and Chinese-English 
translation.

-----------------------------------------------------------
LINGUIST List: Vol-17-549