Clan/mp3 timecoding update

Tom Honeyman tom at FATUOUS.ORG
Wed May 31 02:11:06 UTC 2006


... you just nullified my post-to-be wonderfully! :-)

There is another thing to mention, and that is "lossy" compression's  
counterpart, lossless compression. Although, its not without its own  
drawbacks. It allows you to compress audio in the same the way you  
might zip up a word document to send over email without loosing any  
of the original file. FLAC is an open source compressor. The only  
problem is it doesn't save you that much space... but its better than  
zip or sit because its designed specifically for audio.

Also, personally, and this is pure opinion, I think Psychoacoustic  
compression of language material is more perceptible to non-native  
speakers.

-tom

On 31/05/2006, at 11:51 AM, Bartlomiej Plichta wrote:

> Hello all,
>
> Let me address the issues below.
>
> Linguist - Wangka Maya wrote:
>> I just brought some interested friends into the discussion, Mark  
>> Piggott says:
>>
>> Interesting discussion....... Actually most digital formats  
>> compress using one algorithm or another and are therefore "lossy"  
>> to some extent.
> This is not exactly true. Not all digital audio formats use lossy  
> compression. In fact, there are two popular standards, PCM and DSD,  
> that store raw sample data, without compression. PCM is by far most  
> common in professional applications (e.g., audio CDs), but DSD is  
> becoming more and more widely accepted in the pro audio industry.  
> Psychoacoustic compression is a relatively new thing and is used  
> primarily in consumer applications (e.g., iPod, iTunes Music Store,  
> etc.). There are other audio compression standards used, for  
> example, in digital telephony, but they have little relevance to  
> field linguists.
>> For normal beings, like you and I Grant, mp3 and mp3 vers2 will  
>> reproduce sound to a quality than is undistinguishable from the  
>> real thing. I personally never use wav format - it's a Microsoft  
>> proprietry format not open source.........
> This is not exactly true. There have been numerous psychoacoustic  
> tests done on that and compression is distinguishable. Of course,  
> that depends on a lot of variables, including the compression  
> ratio, the codec used, the playback hardware, the listener's  
> auditory system, etc.
>
> The WAVE standard is not proprietary. It is the native audio  
> standard for Microsoft, but the spec is open. Anyone can write WAVE  
> files, without licensing fees. The biggest problem of the WAVE  
> standards is that over the years it has allowed  a lot of variation  
> in how the various data and metadata chunks are used in the WAVE  
> file. Therefore, the Broadcast WAVE Format (BWF), which is a  
> variant of WAVE, is perhaps a better choice for recording and  
> storage. It is a widely used standard, with very good archival  
> prospects.
>
> The only truly standard-agnostic way of storing PCM data would be  
> in headerless file, but then proper metadata would have to be  
> supplied to read this file, e.g., sample rate, bit-dept, byte  
> order, etc. This is a solution that some people have used,  
> especially in the speech engineering circles.
>> So, how much "lossy" is ok? Is there a minimum standard, beyond  
>> which is unacceptable (I know, least "lossy" is best)? And is .wav  
>> format the least "lossy"?
> Any audio file can be compressed in a lossy process. There are  
> important differences, though. You can "compress" a PCM file (e.g.,  
> WAVE, AIFF, AU, etc.) by reducing its sample rate and bit-depth.  
> For example, you originally acquired your recording at 48,000 Hz  
> and 24-bit. Then you downsample it to 16,000 Hz and lower the bit- 
> depth to 16-bit to use this file with older sofware, for example.  
> This process is lossy, as it removes original samples. This is a  
> linear process.
>
> There is also non-linear, psychoacoustic compression, such as that  
> in MP3, which removes samples from the original uncompressed file  
> in a dynamic process.
>
> Both types of compression are bad for long-term preservation, but  
> the linear type is acceptable for, say, some types of acoustic  
> analysis of speech. For instance, for formant analysis of male  
> voices, the sample rate of 16,000 Hz is sufficient. The lack of  
> original high-frequency content does not harm my analysis in any  
> way, because the low-frequency content (below 8,000 Hz) is left  
> intact. The same is not true of MP3 compression.
>
> I would like to add that the WAVE format is basically a container.  
> It can also, in theory, store dynamically compressed data. It is  
> rare, but the spec allows it.
>
> So to answer the question of how much sample loss is acceptable I  
> would say that for long-term preservation, none. For other  
> purposes, it is far more important to use the right hardware and  
> recording technique. Then evaluate what you need these recordings  
> for. If you can do your analysis well with a compressed MP3 file at  
> 256 kbps, then that's fine.
>
> Best regards,
>
> Bartek



More information about the Resource-network-linguistic-diversity mailing list