17.2233, Sum: Sound-File Formats for Speech Recordings

Thu Aug 3 16:03:50 UTC 2006

LINGUIST List: Vol-17-2233. Thu Aug 03 2006. ISSN: 1068 - 4875.

Subject: 17.2233, Sum: Sound-File Formats for Speech Recordings

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
 <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Kevin Burrows <kevin at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 01-Aug-2006
From: Mario Cal-Varela < iamario at usc.es >
Subject: Sound-File Formats for Speech Recordings 

-------------------------Message 1 ---------------------------------- 
Date: Thu, 03 Aug 2006 12:02:14
From: Mario Cal-Varela < iamario at usc.es >
Subject: Sound-File Formats for Speech Recordings 

Query for this summary posted in LINGUIST Issue: 17.2131                       

Regarding Query: http://linguistlist.org/issues/17/17-2131.html

Dear Linguists:

Last July 24 I posted a query to the list regarding the adequacy of
different file formats for computerized speech analysis. This was the
original text of the query:

I'd like to compare digital speech samples collected from different
sources, including online radio and samples digitized by myself from
analogical sources. I'm specially interested in fundamental frequency and
formant position, as well as time-related aspects of segments (specifically
VOT and vowel duration). My questions are the following:

What features of the speech signal (and in what ways) may be affected by
the format of speech samples (MP3, WAV, stream audio...)? Are the results
of spectrographic analysis of samples with different file formats and
qualities comparable? Is there any relevant bibliography available on this
issue?

First of all, thanks very much to those people who responded to my
questions and provided very useful and relevant suggestions:

James L. Fidelholtz, Benemérita Universidad Autónoma de Puebla, MÉXICO
Mark J. Jones, University of Cambridge
Dominic Watt, University of Aberdeen
Damien Hall, University of Pennsylvania
Heriberto Avelino, University of California at Berkeley

Here is a quick summary of their comments:

Although measurements of duration and time-related aspects of the signal do
not seem to be affected by file format, for formant and F0 analysis the
consensus is that, among the usual formats, only .WAV and .AIFF files are
safe bets. Compression algorithms used for MP3, MiniDisc and similar affect
the signal in many different ways and basically degrade it.

On the other hand, James Fidelholtz comments that, if properly processed,
even very noisy speech can yield to acoustic analysis. For example, he
suggests using cepstrum analysis to get the formants and F0, following
these steps:
   1) get the signal digitalized (if it is analogic); or get the
digitalized signal, if available. (= S)
   2) do a computerized spectrum of the signal. [Sp(S)]
   3) do a cepstrum of Sp(S) (spectrum of the spectrum--this will give you
the fundamental frequency F0 for each discrete sampling point along the
spectrum over time)
   4) have the computer consider *only* the points of Sp(S) which are
'near'integral multiples of F0, and plot the result. This will give you the
formants, even for extremely noisy speech.

The topic seems to recur on discussion lists, so several respondents
suggest using search terms such as MP3, ATRAC, FORMANT, etc. on Google or
on discussion list search engines, for example on PHONET
(http://www.jiscmail.ac.uk/cgi-bin/webadmin?S1=phonet). Mark Jones sends
the following, from Linguist:
http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0409&L=resource-network-linguistic-diversity&P=629.

The IEEE website is also mentioned by several respondents as a possible
source of further information (Institute of Electrical and Electronics
Engineers, Inc. http://www.ieee.org).

For an example of a major project where digitised speech was used, Damien
Hall mentions the Atlas of North American English, which incidentally used
only Wav files (more information at: http://www.mouton-online.com/anae.php).

As for bibliography on the topic, there were also a few suggestions::

- http://www.di.unipi.it/~lcioni/papers/2001/CompData.pdf.

- Paul Foulkes and Catherine Byrne published an article a couple of years
ago in the International Journal of Speech, Language and the Law on changes
in formant frequencies (and I think F0) brought about by the signal
transmission properties of mobile telephone lines.

- Philip Harrison's work on the comparability and relative (un)reliability
of formant frequency measurements made using different software packages
(Praat, WaveSurfer/xwaves+, Sensimetrics, SpeechStation, etc.) is possibly
also relevant here.

- Some discussion on cepstrum analysis can be found in a chapter by
Liljenkrantz in  The handbook of phonetic sciences  (Blackwell), ed. by
Hardcastle & Laver, and probably also in Acoustic phonetics, by Kenneth N.
Stevens in  MIT Press.

- On acoustics in Spanish there are books and articles by, for example,
Antonio Quilis or Borzone de Manrique. I'd also add Eugenio Martínez Celdrán.

Once more, thanks very much to the five kind respondents for all the useful
information and to the whole Linguist community.

Best regards,
Mario Cal Varela
University of Santiago de Compostela 

Linguistic Field(s): Phonetics

-----------------------------------------------------------
LINGUIST List: Vol-17-2233