[Corpora-List] are there corpora of fast speech?

Ute Römer ute.roemer at uni-koeln.de
Thu Jan 16 08:50:57 UTC 2003


Dear Dinoj and others,

Re: fast spontaneous speech. Have you checked Richard Cauldwell's
"speechinaction" website for articles on the topic and information on his
Streaming Speech course? Streaming Speech is an electronic textbook for
advanced learners who want to improve their listening and pronunciation
skills. The course uses recordings of real "messy" fast speech with speeds up
to 490 words per minute (Richard's definition of "fast speech" is "220 WPM or
faster"). Richard's recordings are great to investigate phonological processes
and strategies of speakers, I think. I'm planning to use these materials in an
applied-no-tidy-examples-but-just-real-life-data phonology course next Winter
term but I'm not sure this is what you need for your research project. The URL
is http://www.rtc.pwp.blueyonder.co.uk/

Sorry for underestimating people's willingness to phonetically transcribe
spontaneous speech. I wouldn't have thought that that kind of corpora exist (I
wouldn't want to do it myself, I have to admit, and I wouldn't want anyone to
rely on my transcriptions). Must be extremely time-consuming.

Good luck hunting for more speech data!

Best... Ute


Dinoj Surendran schrieb:

> My thanks to David Purdy, Eric Atwell and Ute Romer for their responses.
> I'll have a look at the corpora suggested, though I suspect none are going
> to serve me the required information on a silver platter :) Transcribing
> a *few* of Kennedy's sentences would certainly an interesting exercise.
>
> A few clarifications on my question. The problem I have in mind is
> investigating phonological rules that only apply in fast speech. An
> example of the kind of rule I have in mind would be stops
> at the end of unstressed syllables of English getting deleted or
> glottalised. As I am quite unfamiliar with the fast speech literature,
> finding a corpus of it seemed a good starting point.
>
> [Ute: there are plenty of phonetically transcribed corpii around;
> TIMIT has 6300 (read, not spontaneous) sentences, each with about  30-40
> phones.  That's still less than a fifth of a million words, true...
> Switchboard is a much smaller corpus, of spontaneous speech, also
> phone-transcribed. I haven't looked at speeds there.]
>
> As for the definition of 'fast', I'm not sure. Word rate is probably a
> better definition than phone rate since limits on articulatory
> apparatus ought to lead to fast speakers eating phones instead of words.
>
> [Interesting statistic for general reference: in TIMIT, the distribution
> of phone rate is close to Normal: mean = 13.71, stdev = 1.95
> (Mirghafori, Fosler and Morgan 1995).]
>
> Dinoj Surendran



More information about the Corpora mailing list