Clan/mp3 timecoding update

Bartlomiej Plichta plichtab at MSU.EDU
Wed May 31 01:51:55 UTC 2006


Hello all,

Let me address the issues below.

Linguist - Wangka Maya wrote:
> I just brought some interested friends into the discussion, Mark 
> Piggott says:
>
> Interesting discussion....... Actually most digital formats compress 
> using one algorithm or another and are therefore "lossy" to some extent.
This is not exactly true. Not all digital audio formats use lossy 
compression. In fact, there are two popular standards, PCM and DSD, that 
store raw sample data, without compression. PCM is by far most common in 
professional applications (e.g., audio CDs), but DSD is becoming more 
and more widely accepted in the pro audio industry. Psychoacoustic 
compression is a relatively new thing and is used primarily in consumer 
applications (e.g., iPod, iTunes Music Store, etc.). There are other 
audio compression standards used, for example, in digital telephony, but 
they have little relevance to field linguists.
> For normal beings, like you and I Grant, mp3 and mp3 vers2 will 
> reproduce sound to a quality than is undistinguishable from the real 
> thing. I personally never use wav format - it's a Microsoft proprietry 
> format not open source......... 
This is not exactly true. There have been numerous psychoacoustic tests 
done on that and compression is distinguishable. Of course, that depends 
on a lot of variables, including the compression ratio, the codec used, 
the playback hardware, the listener's auditory system, etc.

The WAVE standard is not proprietary. It is the native audio standard 
for Microsoft, but the spec is open. Anyone can write WAVE files, 
without licensing fees. The biggest problem of the WAVE standards is 
that over the years it has allowed  a lot of variation in how the 
various data and metadata chunks are used in the WAVE file. Therefore, 
the Broadcast WAVE Format (BWF), which is a variant of WAVE, is perhaps 
a better choice for recording and storage. It is a widely used standard, 
with very good archival prospects.

The only truly standard-agnostic way of storing PCM data would be in 
headerless file, but then proper metadata would have to be supplied to 
read this file, e.g., sample rate, bit-dept, byte order, etc. This is a 
solution that some people have used, especially in the speech 
engineering circles.
> So, how much "lossy" is ok? Is there a minimum standard, beyond which 
> is unacceptable (I know, least "lossy" is best)? And is .wav format 
> the least "lossy"?
Any audio file can be compressed in a lossy process. There are important 
differences, though. You can "compress" a PCM file (e.g., WAVE, AIFF, 
AU, etc.) by reducing its sample rate and bit-depth. For example, you 
originally acquired your recording at 48,000 Hz and 24-bit. Then you 
downsample it to 16,000 Hz and lower the bit-depth to 16-bit to use this 
file with older sofware, for example. This process is lossy, as it 
removes original samples. This is a linear process.

There is also non-linear, psychoacoustic compression, such as that in 
MP3, which removes samples from the original uncompressed file in a 
dynamic process.

Both types of compression are bad for long-term preservation, but the 
linear type is acceptable for, say, some types of acoustic analysis of 
speech. For instance, for formant analysis of male voices, the sample 
rate of 16,000 Hz is sufficient. The lack of original high-frequency 
content does not harm my analysis in any way, because the low-frequency 
content (below 8,000 Hz) is left intact. The same is not true of MP3 
compression.

I would like to add that the WAVE format is basically a container. It 
can also, in theory, store dynamically compressed data. It is rare, but 
the spec allows it.

So to answer the question of how much sample loss is acceptable I would 
say that for long-term preservation, none. For other purposes, it is far 
more important to use the right hardware and recording technique. Then 
evaluate what you need these recordings for. If you can do your analysis 
well with a compressed MP3 file at 256 kbps, then that's fine.

Best regards,

Bartek



More information about the Resource-network-linguistic-diversity mailing list