saving corrupted file

Bill Poser billposer2 at GMAIL.COM
Sat Aug 28 18:33:53 UTC 2010


As Sebastian says, the problem is most likely the header. WAV files begin
with a header which in the simplest standard-conforming case is 44 bytes
long. This header contains information about the representation of the audio
and its duration. When recording in real time, it is of course impossible to
fill in the duration information correctly - you have to leave those four
bytes blank, or set them to 0, then come back and fill in the correct
information once the recording is complete and you know its duration.
Anything that terminates the recording before it is possible to go back and
clean up the header will result in bad duration information.

Depending on how your device writes data to the disk, which is a function
both of the drive technology and the software, the most recently recorded
audio data may also be missing or corrupted by the loss of power.

As already noted, one approach is simply to remove the header, then convert
the resulting raw file back to WAV. In this case, you may need to provide
the converter with information about the audio since it can't get it from
the header. Note, by the way, that if you tell the converter that your
corrupted WAV file is a raw file, it does not actually strip the header -
after all, you've told it there isn't any. Rather, what it does is treat the
header as the first bit of audio data. The result is that the first few
samples of your new audio file will be garbage. This won't make any real
difference though since at typical sampling rates the garbage will have a
duration of about 1 millisecond.

The other approach is to edit the WAV file header, which, however, takes a
bit of computing expertise. The duration is the length of the audio chunk in
bytes, expressed as a 4 byte little-endian unsigned integer. If the WAV file
is in the simplest standard-conforming format, those four bytes will be
bytes 40-43 (assuming that the first byte of the file is numbered zero).
Unfortunately, it is not uncommon to encounter "WAV files" that do not
conform to the standard, and it is also common for them to be
standard-conforming but contain additional, usually unnecessary, chunks.
(The WAV format is, from a linguistic point of view, much more complex than
necessary. WAV files potentially contain all sorts of stuff of interest only
to the entertainment industry, such as play lists and cue lists.)

For those interested, my own lecture notes on audio files are at:
http://www.billposer.org/Linguistics/Computation/LectureNotes/AudioData.html#wave
and a beautifully illustrated explanation of WAV file format can be found
at:
http://ccrma-www.stanford.edu/courses/422/projects/WaveFormat/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20100828/8701f6e6/attachment.htm>


More information about the Resource-network-linguistic-diversity mailing list