auto alignment of transcript and audio?

Sun Apr 10 09:06:47 UTC 2011

On 9 April 2011 17:33, Nick Thieberger <thien at unimelb.edu.au> wrote:
> Has anyone had experience of software that takes a textual transcript
> and aligns it with the media it transcribes? I know it exists for
> major languages but have not seen it working and do not know of the
> software. It would be interesting to know how applicable it could be
> to the many hours of (handwritten/typed) transcripts of recordings we
> have in the PARADISEC collection.

This task is sometimes called "forced alignment".  It requires an
acoustic model, i.e. a way to discover the canonical pronunciation of
a word.  The task is more difficult if the transcript has gaps or
corrections, or if there is variability in the speech (e.g. multiple
dialects), or if the recording is not high fidelity.

Once a transcript is aligned, we have new data on how words are
pronounced, and the acoustic model can be retrained.  The quality of
forced alignments improves.  And so on it goes.  The obvious question,
then, is how much manually aligned data is needed in order to
bootstrap the process.  I think this is an open question for
linguistic field recordings.

-Steven Bird