transcriber-soundFormats

Bartlomiej Plichta plichtab at MSU.EDU
Tue May 2 13:33:12 UTC 2006


Very good points, Joe.
Just to clarify, Akustyk will convert PCM files (wav, aiff, au) to MP3 using 
the Lame codec. The conversion is optimized for speech in terms of 
compression, sample rate, bit rate, etc. This, however, will not prevent the 
misaligment of wav and MP3 files in applications such as Transcriber. The 
misalignment results both from the wav-to-MP3 conversion, as well as the 
application that reads MP3 files. 

Regards, 

Bartek 

Joe Blythe writes: 

> I had a slightly worse problem than John's but it relates to the same 
> thing.
> I transcribed a number of transcripts in Clan from mp3s. They were all 
> encoded with a fixed rather than a variable bit rate, I think I made the 
> files with Amadeus. I also did this to save hard drive space. Some time 
> later I received a warning about mp3s and I checked the time coding 
> against corresponding wav files and found that the timecoding was out of 
> alignment. At the end of a twenty minute transcription the bullet points 
> didn't select one word of the correct utterance although at the beginning 
> of the transcript the match was fine. I now have the problem of recoding 
> or adjusting the timecoding of those transcripts and realigning them to 
> wav. 
> 
> It's possible that as Bartek suggests converting wav to mp3s using this 
> Akustyk might be an improvement but I would thoroughly discourage the 
> transcription from mp3s. If hard drive space is an issue then I suspect a 
> hardware solution to the problem would be infinitely better than a 
> software solution. 
> 
> Cheers
> Joe 
> 
> On 02/05/2006, at 9:00 AM, Bartlomiej Plichta (by way of Nicholas 
> Thieberger) wrote: 
> 
>> The PCM (wav, aiff, au, etc.) files have a much different structure than 
>> an MP3 file. While a PCM file contains one file header followed by raw 
>> sample data (in one chunk), an MP3 file has many small data chunks, each 
>> with its own header, and each containing different compression. There is 
>> no duration information in an MP3 file because of that, so an application 
>> that reads the file must compute it, roughly. This is particularly 
>> problematic with MP3 files encoded with Variable Bit Rate (VBR). Duration 
>> variation might also additionally occur if the MP3 file is created by 
>> resampling the wav file, which happens very often. So there will always 
>> be some degree of duration mismatch. 
>> 
>> I would therefore recommend using Constant Bit Rate (CBR) and the same 
>> sample rate as the wav file. This will help minimize the problem. I would 
>> also avoid using conversion software where these parameters are not 
>> transparent to the user. Also, most commercial MP3 converters are 
>> optimized for music, not speech. That's potentially a problem, as well. 
>> 
>> In my experience, Akustyk (http://bartus.org) does a decent job 
>> converting from wav to MP3 using the Lame codec. It is optimized for 
>> speech and produces decent results. Finally, may I ask why one would want 
>> to use MP3 is speech research? There's quite a bit of signal degradation 
>> involved, particularly with multiple resampling. 
>> 
>> Hope this helps. 
>> 
>> Best, 
>> 
>> Bartek 
>> 
>>> 
>>> From: John Giacon <jgiacon at ozemail.com.au>
>>> Subject: transcriber-soundFormats
>>> To: rnld list <Resource-Network-Linguistic-Diversity at unimelb.edu.au>
>>> MIME-version: 1.0 (Apple Message framework v728)
>>> Precedence: list
>>> X-Spam-Score: * (1.296) HTML_MESSAGE,TRACKER_ID
>>> X-Spam-Info: http://www.infodiv.unimelb.edu.au/email/spam/
>>> Comments: RESOURCE-NETWORK-LINGUISTIC-DIVERSITY Mailing List
>>> Hello, 
>>> 
>>> I have been using transcriber, and the sound files were .aif files. To 
>>> save room I converted the sound files to mp3 using iTunes.   When I 
>>> reopened the files in transcriber the timings got out of sync, so that 
>>> by the end of the file [around 60 min] the text and sound file were 
>>> about 5 seconds out of sync. - the sound has been 'stretched' by the 5 
>>> seconds;
>>> In fact the image of the sound file corresponds to the text divisions, 
>>> but the actual sound is delayed, so there is 5 seconds of a plain line 
>>> at the end of the sound image, but it corresponds to actual sound;
>>> I could do the transcription using the mp3 files, but the problem of the 
>>> sound and the image not corresponding still remains;
>>> any suggestions are very welcome. 
>>> 
>>> John 
>>> 
>>> 
>>> John Giacon
>>> Christian Brothers, 14 Landsborough St
>>> Griffith, ACT 2603
>>> 02 6239 6300
>>> [0421 177 932 when away]
>> jgiacon at ozemail.com.au <mailto:jgiacon at ozemail.com.au> 
>> 
>> --
>> Bartlomiej Plichta
>> http://bartus.org 
>> 
>> 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Joe Blythe 
> 
> Department of Linguistics
> Transient Building
> University of Sydney
> NSW 2006 
> 
> 
 



More information about the Resource-network-linguistic-diversity mailing list