14.876, Sum: Speaker Recognition Tapes Given in Evidence

Wed Mar 26 17:06:04 UTC 2003

LINGUIST List:  Vol-14-876. Wed Mar 26 2003. ISSN: 1068-4875.

Subject: 14.876, Sum: Speaker Recognition Tapes Given in Evidence

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Steve Moran <steve at linguistlist.org>
 ==========================================================================
FUND DRIVE 2003

To give you an incentive to donate, many of our Supporting Publishers
have generously donated some amazing linguistic prizes. As a donor you
are automatically entered into this prize draw. To find out what's on
offer and the rules etc., visit:

http://linguistlist.org/prizedraw.html

We still have a long way to go, however, to reach our target of
$50,000. Please make a donation at:

http://linguistlist.org/donation.html

The LINGUIST List depends on the generous contributions from
subscribers like you; we would not be able to operate without your
help.

The moderators, staff, and student editors at LINGUIST would like to
take this opportunity to thank you for your continuous support.

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  Wed, 26 Mar 2003 09:28:50 +1200
From:  Fay Wouk <f.wouk at auckland.ac.nz>
Subject:  Summary:  speaker recognition in audio tapes given in evidence

-------------------------------- Message 1 -------------------------------

Date:  Wed, 26 Mar 2003 09:28:50 +1200
From:  Fay Wouk <f.wouk at auckland.ac.nz>
Subject:  Summary:  speaker recognition in audio tapes given in evidence

A couple of weeks ago I posted this query (Linguist 14.751):

I have been approached by a local law firm for assistance in a court
case, but do not have the expertise required. They came to me because
I do conversation analysis, but this is really something different,
although it does relate to recorded conversation.  The situation is as
follows: police witnesses are making claims about the identity of
speakers for individual turns on an audio tape being used as
evidence. The law firm feels that the assignment of speakers to turns
is being done in an arbitrary fashion, and doubts its accuracy. It is,
of course, crucial to the case to know who said what.  They would like
an expert witness who could say why the accuracy is
questionable. While I know from personal experience that it can be
difficult to identify the speaker of certain turns in multi-party
conversation, they want actual scientific explanations for why
impressionistic identity assignment might be a problem, and how one
can accurately assign identity. (I have not yet listened to their
tapes, but will have an opportunity to do so.)  If anyone has any
experience with such matters, or knows of any published material
relating to it, or has any ideas about how to go about doing this,
please contact me. If you think you might be able to help, but you're
still not really sure what I'm asking, or need more detail, please
contact me with clarification questions.

***********************************************************

The responses I received fall into several categories. Some people
offered their services, and I thank them. However, the law firm has no
desire to pay for experts coming in from overseas. I have not included
those offers in the summary.  Michael Erard suggested posting to
forensic-linguistics at jiscmail.ac.uk, and Carsten Otto forwarded my
post to that list.  A large group of responses related to who to
contact, and what web sites or journals to look at. A few of those
contained specific references.  Another group of responses discussed
the issue, suggesting how to do voice recognition or outlining some of
the issues involved in it. The messages in those two groups are given
below, divided into the two categories. I wish to thank all the people
who took the time to write those responses. For those who asked for an
update about the case, here it is. I turned all the posts I received
over to a former student, Bronwen Innes, whose PhD topic was in the
area of language and the law, and who was happy to act as expert
witness. She is following up various of the sources and references
listed below. The law firm is providing her with extracts from the
tape, so that she can comment on the quality of the recording and the
difficulty of identifying speakers, both in general and on this
tape. They do not want to give her the relevant sections of the tape,
where their client is speaking. Apparently they don't want to take the
chance of her saying that the police were accurate in this
instance. She's hoping that perhaps after she testifies the judge will
order the tapes to be given over to her for more expert analysis, but
that may never happen. Responses relating to who to contact or where
to look for help.  You might try to get in touch with Diana Eades
(applied linguistics at the University of Hawaii) She is an Australian
who has done lots of work of the type you describe.

Barbara Horvath

****************************************************

I would suggest you try to contact Prof. Diana Eades, who is
Australian and used to be at University of New England, Armidale, NSW
2551 with this e-mail: deades at metz.une.edu.  However, for a while she
was at the U of Hawaii at this e-mail: eades at hawaii.edu. I haven't
been in direct touch with her for some time, so I can't tell you her
present location ad affiliation. However, I would bet that any of the
Australian linguists who do forensic linguistics, like John Gibbons,
would know her whereabouts. The last e-mail I have for him is:
john.gibbons at linguistics.usyd.edu.au. He may also be able to connect
you with linguists in Australia (closer than Chicago!)  who can help
with this problem. Then there are two UK linguists, John Baldwin and
Peter French, who wrote a whole book called "Forensic Phonetics"
(1990: Pinter), who should be able to advise you.

Judith Levi

****************************************************

I can't answer your question directly, but I can give you some
leads. I would go first to a forensic phonetics lab. I don't know who
in New Zealand does that kind of work, but others would. You might
contact Peter French in the UK at jpf at jpfrench.demon.co.uk. You may
also want to contact Ton Broeders in the Netherlands at
t.broeders at nfi.minjus.nl. Ton works for the government there. If you
ask him if he knows of who is doing good work in this field in New
Zealand for a client other than the gov't, he will tell you if he
knows.

Larry Solan

****************************************************

Try this website: http://www.owlinvestigations.com  It may be too late
or too expensive, but this company does what you want.

Pete Unseth

****************************************************

Yes, it's known to be a very hard problem, and (not surprisingly,
since the human auditory apparatus is good at normalizing away speaker
variations) something which humans are not particularly good at - it's
one task that computers are typically better than humans! 

I know that James and Janet Baker have done some consulting of this
type. They can be reached at

Janet_Baker at email.com
Janet_Baker at email.com

and

jim at sandboxscribe.com
jim at sandboxscribe.com

 - - Jonathan Young

****************************************************

I know nothing definitive about it but we have all heard about "voice
print" identification. I googled about 460 hits on it, saw some
courses being taught and news articles about verifying the voice of
osama on one of his tapes, but didn't quickly find anyone offering
expertise as a service. This might be the direction to go in, however.

William Bliss

****************************************************

I would start with professional literature specifically on forensic
analysis of wiretap data, before looking to the broader acoustics
literature. There is a forensics professional organization in NZ,
maybe they can help locate appropriate journals and experts:
Australian and New Zealand Forensic Science Society (ANZFSS)

http://www.nifs.com.au/ANZFSS/ANZFSS.html

Steve Lowe

****************************************************

I asked my wife, who did her MSc dissertation in Edinburgh on forensic
linguistics. She writes: Try Peter French (forensic linguistics -
can't remember where he is) or Hermann Kunzel at undeskriminalamt,
Wiesbaden (he's done work on speaker identification using signal
analysis and pattern recognition - though I only have a paper from 10
years ago). Or else try the Society of Forensic linguistics.

Bernard Payne

****************************************************

while I am not an expert in the field of forensic speaker
identification, either, you might find something useful on the website
of the "International Association for Forensic Phonetics":

http://www.iafp.net/

The Journal "Forensic Linguistics" has some articles on the topic

(http://www.builder.bham.ac.uk/forensiclinguistics/welcome.asp),

e.g.:

* Schiller, N. O. & Köster, O. (1998). The ability of expert witnesses
to identify voices: A comparison between trained and untrained
listeners. Forensic Linguistics. The International Journal of Speech,
Language and the Law, 5, 1-9.

* Künzel, H. J. (1994). On the problem of speaker identification by
victims and witnesses. In: Forensic Linguistics 1, 45-59.

For a more general, short intro to the subject of forensic
linguistics, you might want to check

http://www.csa.com/hottopics/linglaw/overview.html

-- Caren Brinckmann

********************************************

Responses discussing issues in or how to do voice identification. I
have faced your situation several times. I don't know the final
answers but here's my thoughts anyway. An outsider can't know the
names of the speakers. Best thing to do is to mark a transcript UM1,
UM2, UF1, UF2, etc. (UM means unidentified male, etc) if there is
adequate reason to suspect that the UMs and UFs are different. It's
often possible to identify male from female voices, although not
always. Children's voices are a serious problem of course. Even with
this caution, you'll need to have some auditory or acoustic phonetic
evidence to support your separation of UM1 from UM2, etc. Consistent
differences in vowel production is one such way. Idiosyncratic word
usage, grammatical structures, etc. are another. Speech affectations,
such as lisps, larygealization, creaky voice, etc. help too. Of
course, if the speakers happen to name each other at some point, you
can justify using the assigned names as well. Warning, I had a case
one time where three of the four speakers were named John. Sound
spectography can help a lot here, if you have expertise in it and
access to it. If not, it is not unusual to request someone to do it.
Police transcripts are notoriously bad in such matters, often
conveniently agreeing with the police's theory of the case.
Fortunately, the person who makes the police transcript should be
subject to the same questioning that you will face. You will need to
call on your field's expertise to trump it when and if it is different
from yours. Are you skilled in phonetics? Consistently different
pitch, pace and intonation contours can help too, as can different
accents, if any exist. Things get more problematic when the tape
contains heightened emotions. Voices tend to get higher, blurring even
female/male contrasts sometimes.

Roger Shuy

****************************************************

I would agree with the law firm that witnesses' judgments may not be
accurate 100%. I think the best way to go is to use speech recognition
technology to make objective and scientific judgments based on
acoustic analysis of the voices of the participants in that
conversation. Although the technology may not be able to identify
idiosyncratic properties of everyone in the globe, I believe it can
make accurate identification from a small number of voices which is
the case in your situation.

Ali Farghaly, Ph.D.

****************************************************

I did my doctoral dissertation on turn-taking, within a conversation
analysis framework. I defended it successfully this past December, in
Mexico City. The hard job was transcribing the tapes of conversations,
especially one where I had 12 people around the table, and initially
they were not aware of the portable tape recorder. Even though the
participants were all people I knew (they were part of the family, so
I could recognize their voices), there were constant interruptions,
and the sound was not always the best, since the conditions of
recording were not optimal. There are many factors that could make it
difficult to identify the speakers: number of speakers, interruptions,
overlap of two or more conversations, speakers with similar voices,
distortion of sound, noise, quality of tape (if it is
non-professional, which I assume is the case). Audio-cassettes are not
as reliable as mini-discs. I worked with transcribing both, and the
difference in sound quality was significant. I recall one particular
recording, on mini-disc. There were places where two conversations
overlapped, but on the mini-disc I was able to "separate" the two
conversations: to concentrate on one conversation and transcribe, then
to rewind and concentrate on the other conversation and
transcribe. But the same did not happen with audio-tapes, especially
those recorded with portable tape-recorders. I don't know what kind of
machine was used the for the recording, but I know for a fact that the
kind of tape-recorder used affects the sound quality dramatically. A
lot of my data was part of a project on studying the Spanish of Mexico
City, and different recording machines were used: mini-disc, digital
(with lapel microphones), and audio, with just the integrated
microphone on the portable tape-recorder. The mini-disc was the best,
and the audio the worst.

Gina Musselman

****************************************************

I noticed your query on the Linguist List regarding speaker
identification. I'm not sure if this will help, but I have attached a
paper that is in press at the Journal of Experimental Psychology:
Human Perception & Performance (it should be out soon). The paper
examines the phenomenon of change detection in the auditory domain.
The paper describes a couple of experiments in which participants
heard a list of words over a set of headphones. Halfway through the
list a different voice began to present the words. Only about 40% of
the participants noticed that the voice presenting the list changed to
a different person...not anywhere near as accurate as a layperson
might expect ("They're two different voices, how can you not tell the
difference?").

Mike Vitevitch

****************************************************

If I understand things correctly, the conversation involves a number
of speakers, and the concern of the law firm is that utterances may be
incorrectly attributed to the various speakers?  If the actual number
and identities of all the possible speakers is known, then the
assignment of conversational turns to particular speakers can probably
be done with a high level of success using expert auditory and
acoustic analysis. In optimal circumstances, lay listeners may appear
to be equally successful, but their success and reliability will be
strongly influenced by factors such as: * familiarity with the
speakers' voices before this conversation was recorded (high
familiarity should mean the witnesses/lay listeners are fairly
accurate in their attribution of turns to particular speakers; low
familiarity with the voices makes them much less reliable witnesses
and their testimony should be regarded as far less accurate than
independent expert analysis)* listeners always operate with
expectations about the structure of conversation, as I'm sure you
know, and parties involved in a case are rarely able to separate their
strong expectations about who will say what from what is objectively
present in a conversation. The typical example is police officers
transcribing conversations inaccurately because of their expectations
of what will be said or who will say specific things. Of course,
defence parties will commit similar mistakes.  There are a number of
books on forensic speech analysis that might be looked at, though the
main point which you would quickly be able to refer to is just that
listeners are often very unreliable in their identification of
speakers, due to the multitude of influencing factors:
expectation/memory/bias/available information/etc

Rose, Philip (2002): Forensic Speaker Identification Hollien, Harry
(2002): Forensic Voice Identification Hollien,

Harry (1996?): The Acoustics of Crime There is basic information
about forensic speech analysis on my website (see below);

also at the website of Helen Fraser at University of New England
(Armadale, Australia), which should be found by Google search if you
enter 'Helen Fraser forensic phonetics'

- Dr Duncan Markham

http://www.interfaceanalysts.com/forensic.html>www.interfaceanalysts.com/forensic.html

****************************************************

I testified in a similar case in Liverpool, UK two years ago. The
person accused by the police was "identified" by means of
impressionistic criteria by the "president" of the English Society of
Forensic Phoneticians. Acoustic analysis revealed that the person in
question would have had to stretch his neck some 3 cm to produce
signals that supposedly were his. He was acquitted.  You can access my
CV  which demonstrates competence in this area including
consulting for the FBI and the NTSB (on the Egytair crash - which
involved voice identification).

Philip Lieberman

****************************************************

Oh for heaven's sake, doesn't anyone remember the scientific method?
If you want to show that speaker identification is unreliable
then_test_it_. Find some environment that is similar to the one where
the conversations in question took place -- a restaurant, legal
office, streetcorner, whatever. Bring a video camera and record some
conversation. Ask some experts to try to distinguish the speakers by
audio alone, using whatever methods they are using on the evidence
tapes, then look at the video to see how they did. The recordings must
be candid, if that is legally/ethically possible. 2) Try to get the
same sound quality or better than in the evidence tapes. 3) Try to get
the same _style_ or better.  By good style I mean long sentences and
people seldom talking at once. In other words, make the test as
realistic as you can, and always give the opposition the benefit of
the doubt so that they will have no excuse if their experts fail.

Ben Thompson

****************************************************

The controversy about forensic use of "voiceprints" has been
around for at least 25 years. I have not followed the controversy
lately, but in the 70's and 80's the legal resolution was complicated
by the fact that the proponents of the methodology (mostly ex-police
officers rather than speech scientists, but including some scientists
and clinicians) had their own professional society.  Although the
vast majority of members of scientific bodies such as Acoustical
Society of America thought the technique was not reliable under
forensic conditions, very few members of ASA qualified as expert
witnesses, because they were not trained in the particular techniques
nor were they members of the particular professional society. The few
who were (I believe Peter Ladefoged was in this group) took it on as
almost a mission to try impose scientific standards and to testify
against overuse of the methodology.  Ben's suggestion of the use of
the scientific method is excellent, and is certainly the right
approach to the broader controversy.  However, it is probably not a
fit for this particular case for several reasons:

1) If the case is already at trial, there is probably not time.

2) Determining turn-taking is a very different, and much easier task
than "voice identification", or even "voice verification."

You are distinguishing between only two voices and you have known
samples of each voice under the identical recording conditions (from
unambiguous parts of the conversation). This is easier even than voice
verification in which you are making a binary accept/reject decision
of one person against the rest of the population. On the other hand,
the task can still be arbitrarily hard and error prone under noisy
recording conditions.  I am fairly certain that a careful scientific
study will show that it is very easy to tell apart at least some pairs
of voices (under reasonable recording conditions). If this is true,
then the average error rate for a random pair of voices will not be
relevant for a trial in which there is a particular pair of voices in
a particular recording condition. That is, it seems that no conclusion
could be drawn except by studying the actual recordings.  Assuming
that New Zealand courts have a "reasonable doubt" criterion, one way
to show that the turn taking discrimination is unreliable on this
particular set of recordings is to have several experts independently
label the turn taking. To show reasonable doubt, it would not be
necessary to test the experts used by the other side, but merely to
have several other experts each independently do the task.  If
the experts are not is substantial agreement, it would cast doubt on
anyone's ability to do the task on this particular set of
recordings. Thus the evidence would be doubt about these particular
recordings, rather than trying to prove that discrimination of turn
taking is impractical in general. Of course if the new experts all
agree with the other side's experts, then you might have to accept
that the turn taking determination is reliable on these recordings.

Jim

---------------------------------------------------------------------------
LINGUIST List: Vol-14-876