16.2246, Review: Textbooks/Phonetics/Comp Ling: Coleman (2005)

Sun Jul 24 16:53:03 UTC 2005

LINGUIST List: Vol-16-2246. Sun Jul 24 2005. ISSN: 1068 - 4875.

Subject: 16.2246, Review: Textbooks/Phonetics/Comp Ling: Coleman (2005)

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Ogasawara <naomi at linguistlist.org>
================================================================  

What follows is a review or discussion note contributed to our 
Book Discussion Forum. We expect discussions to be informal and 
interactive; and the author of the book discussed is cordially 
invited to join in. If you are interested in leading a book 
discussion, look for books announced on LINGUIST as "available 
for review." Then contact Sheila Dooley at collberg at linguistlist.org. 

===========================Directory==============================  

1)
Date: 24-Jul-2005
From: David Deterding < dhdeter at nie.edu.sg >
Subject: Introducing Speech and Language Processing 

-------------------------Message 1 ---------------------------------- 
Date: Sun, 24 Jul 2005 12:50:42
From: David Deterding < dhdeter at nie.edu.sg >
Subject: Introducing Speech and Language Processing 

AUTHOR: Coleman, John
TITLE: Introducing Speech and Language Processing
SERIES: Cambridge Introductions to Language and Linguistics
YEAR: 2005
PUBLISHER: Cambridge University Press
Announced at http://linguistlist.org/issues/16/16-631.html

David Deterding, NIE/NTU, Singapore 

OVERVIEW

This book is an introduction to two separate but related areas: speech 
analysis and language processing. It aims to provide a straightforward 
introduction to these two topics, suitable for readers with some knowledge 
of phonetics and grammar but little or no background in the computer 
analysis or manipulation of speech and language, and it provides an 
introduction to such techniques as digital filtering, linear predictive 
coding, deterministic and non-deterministic parsing, and Markov modelling 
of speech. 

Most of the computer programs that are discussed in the text are provided 
in an accompanying CD-ROM, including C programs for signal processing and 
Prolog programs for parsing, and the reader is encouraged not just to run 
these programs but to modify them so as to become fully familiar with 
their structure and operation.

SYNOPSIS

After an introductory chapter outlining the contents and aims of the book, 
Chapters 2, 3 and 4 introduce some signal processing techniques with 
illustrative programs all written in C. Chapter 2 deals with the 
generation of a simple cosine wave, Chapter 3 presents basic digital 
filters, and Chapter 4 covers linear predictive coding for modelling the 
spectral characteristics of speech. In all these areas, the presentation 
introduces the techniques step-by-step, making a commendable effort to 
explain all aspects of the programs in a style that is accessible to 
readers with no background in signal processing or computer programming.

In Chapter 5 the focus shifts to the use of Prolog programs to demonstrate 
the implementation of finite-state machines, in order to parse and also 
generate phonologically well-formed strings of phonemes in English. Once 
more, the reader is taken through the example programs line by line, to 
ensure that even those with no previous knowledge of Prolog can easily 
understand the code and modify it if they choose.

Chapter 6 covers speech recognition techniques, including dynamic time 
warping and vector quantization. And Chapter 7 deals with the importance 
of incorporating probability estimates in finite-state models, including a 
substantial discussion of the need for probabilistic parsing despite the 
theoretical objections of many linguists such as Chomsky. Neither Chapter 
6 not 7 include illustrative programs, presumably because some of the 
techniques discussed, such as Hidden Markov Models, would be just too long 
and complicated for an introductory book, though it is not so obvious that 
a simple implementation of dynamic time warping would not have been 
feasible.

Chapter 8 introduces syntactic parsing, with some basic programs written 
in Prolog for parsing of a very limited set of English sentences. And 
Chapter 9 discusses the practical issues of incorporating probability into 
the parsing algorithm, clearly demonstrating that there is no reason why 
sentences that have never been uttered before should pose a problem for 
probabilistic parsers, as was once claimed by Chomsky. Finally, at the end 
of Chapter 9, the implementation of a simple probabilistic context-free 
grammar is illustrated in Prolog.

CRITICAL EVALUATION

One issue with regard to this book can be illustrated by the effort to 
clarify a single line of code in the first C program that is presented:

x = (short int *) calloc(length,sizeof(short int));

Over half a page (pp. 37-38) is spent carefully explaining that this 
allocates memory for an array of short integers, but it is unfortunately 
probably true that many potential readers, even some with a substantial 
interest in the analysis and manipulation of speech, will find some of 
this explanation impenetrable. 

In fact, for the line of code listed above, the text never actually fully 
explains what the first part of this line does, that (short int *) ensures 
the calloc function returns a pointer to a short integer, presumably 
because it is assumed that going into too much detail about the use of 
pointers in C is not appropriate for an introductory book on speech 
processing. But this means that those readers who do not have any problems 
with the technical aspects of the text might end up frustrated when the 
whole of the code is not explained.

So, has Coleman got it right, in attempting to explain as much as possible 
about how the code works but not necessarily going into all the details? I 
think he has, and the level of detail is about right. One probably needs 
to accept that it is necessary for readers to run the programs and also 
manipulate them if they are to gain a reasonable understanding of the 
material covered in this kind of practical textbook, and if some readers 
find they cannot cope with the analysis and compilation of the code, well 
so be it. 

Another example of technical details that some readers may find a bit 
daunting is the discussion of big endian and little endian computers (p. 
32). Most of us really do not care how our computers store integers so 
long as they work fine. So is it really necessary to go into these details 
about how integers are stored? Well, yes it probably is. If readers are to 
be able to load speech data into programs and then manipulate the data in 
various ways, then they probably do need to find out if they are working 
on a big endian machine (Motorola) or a little endian machine (Intel). So, 
once more, distasteful as this discussion might be to some readers, 
Coleman probably has got it right. Indeed, throughout the book, he always 
makes an admirable effort to present the material in a style and format 
that is accessible even to those with no background in computer 
programming, and by and large these efforts are probably highly 
successful, even if it may be necessary to acknowledge that some readers 
will not be able to grasp all the concepts.

Coleman makes no claims to expertise in syntax. In fact he admits (p. 223) 
that he probably knows rather less about syntax than many readers. And, 
indeed, a few aspects of the syntactic models that he presents are a bit 
suspect. For example, he adopts a rather traditional generative model of 
English, with rules such as np --> det, adj, n (p. 232), eschewing the use 
of determiner phrases that are proposed in many more recent models. But 
then the first rule is ip --> np, vp, and this use of ip to represent a 
sentence makes no sense when the sentence includes no inflectional 
component, i, that can act as the head of the ip. It would have been 
better here to stick to the traditional use of s to represent the top node 
of a sentence (as indeed is done in Chapter 9, with no explanation for the 
switch). But such minor quibbles miss the point: this is not a textbook on 
syntax. It is an introductory text on signal processing and language 
parsing, and it presents these topics exceptionally well and very clearly. 

Occasionally, gaps remain in the implementation of some techniques. For 
example, the use of a finite state transducer is described (pp. 144-149) 
for matching simple sequences of vowels and consonants against stored 
arrays of linear prediction coefficients, but many readers will wonder how 
the closest match is computed between a new set of lpc values and the 
stored data. Although this is (partially) resolved when vector 
quantization is introduced (p. 179), thirty pages is a bit long to leave 
readers pondering over this rather central issue. Furthermore with regard 
to the implementation of the finite state transducer, the simple matching 
algorithm only mentions vowels and fricatives, and this fails to deal with 
the obvious issue that plosives are characterised by silence so that the 
only way /b, d, g/ can be differentiated from each other is by means of 
their transitions from and to the neighbouring sounds, something which 
cannot be handled by means of single targets for each phoneme. But once 
more, maybe this is missing the point: the aim of the book is to introduce 
a wide range of speech processing techniques in a practical and 
straightforward manner, not to go into all the details of their 
implementation. And this it does extremely well, so we should not quibble 
too much about some minor flaws in the simple implementations, or worry if 
all of the details are not fleshed out.

Overall, Coleman is to be congratulated on this handsomely produced, 
easily accessible, fascinating book which many, many students of speech 
and language will undoubtedly find exceptionally valuable. 

ABOUT THE REVIEWER 

David Deterding is an Associate Professor at NIE/NTU, Singapore, where he 
teaches phonetics, phonology, syntax, and Chinese-English translation.

-----------------------------------------------------------
LINGUIST List: Vol-16-2246