transcriber-soundFormats

Bartlomiej Plichta (by way of Nicholas Thieberger) plichtab at MSU.EDU
Mon May 1 23:00:37 UTC 2006


The PCM (wav, aiff, au, etc.) files have a much different structure than an MP3 file. While a PCM file contains one file header followed by raw sample data (in one chunk), an MP3 file has many small data chunks, each with its own header, and each containing different compression. There is no duration information in an MP3 file because of that, so an application that reads the file must compute it, roughly. This is particularly problematic with MP3 files encoded with Variable Bit Rate (VBR). Duration variation might also additionally occur if the MP3 file is created by resampling the wav file, which happens very often. So there will always be some degree of duration mismatch.

I would therefore recommend using Constant Bit Rate (CBR) and the same sample rate as the wav file. This will help minimize the problem. I would also avoid using conversion software where these parameters are not transparent to the user. Also, most commercial MP3 converters are optimized for music, not speech. That's potentially a problem, as well.

In my experience, Akustyk (http://bartus.org) does a decent job converting from wav to MP3 using the Lame codec. It is optimized for speech and produces decent results. Finally, may I ask why one would want to use MP3 is speech research? There's quite a bit of signal degradation involved, particularly with multiple resampling.

Hope this helps.

Best,

Bartek

>
>From: John Giacon <jgiacon at ozemail.com.au>
>Subject: transcriber-soundFormats
>To: rnld list <Resource-Network-Linguistic-Diversity at unimelb.edu.au>
>MIME-version: 1.0 (Apple Message framework v728)
>Precedence: list
>X-Spam-Score: * (1.296) HTML_MESSAGE,TRACKER_ID
>X-Spam-Info: http://www.infodiv.unimelb.edu.au/email/spam/
>Comments: RESOURCE-NETWORK-LINGUISTIC-DIVERSITY Mailing List
>Hello,
>
>I have been using transcriber, and the sound files were .aif files. To save room I converted the sound files to mp3 using iTunes.   When I reopened the files in transcriber the timings got out of sync, so that by the end of the file [around 60 min] the text and sound file were about 5 seconds out of sync. - the sound has been 'stretched' by the 5 seconds;
>In fact the image of the sound file corresponds to the text divisions, but the actual sound is delayed, so there is 5 seconds of a plain line at the end of the sound image, but it corresponds to actual sound;
>I could do the transcription using the mp3 files, but the problem of the sound and the image not corresponding still remains;
>any suggestions are very welcome.
>
>John
>
>
>John Giacon
>Christian Brothers, 14 Landsborough St
>Griffith, ACT 2603
>02 6239 6300
>[0421 177 932 when away]
jgiacon at ozemail.com.au <mailto:jgiacon at ozemail.com.au>

--
Bartlomiej Plichta
http://bartus.org



More information about the Resource-network-linguistic-diversity mailing list