[Corpora-List] Sum: Speech Corpus for Neural Network Training

Scott Drellishak sfd at u.washington.edu
Tue Aug 24 01:18:23 UTC 2004


A few weeks ago, I posted a request for information about speech corpora of
a particular kind to both the Linguist List and the Corpora-List.  This is
the (somewhat belated) summary.

I described the corpora we are seeking as follows:

"We are looking for a corpus that contains samples of many speakers
producing many vowels (preferably in a less reduced register) that also
contains human-validated pitch and formant (F1, F2, and F3) tracks and, if
possible, bandwidth information.  A corpus that contains more than just
vowels is fine, since we can discard sections of the samples that do not
suit our needs."

I received five replies:

1)  John Lawler suggested MICASE (Michigan Corpus of Academic
    Spoken English), which is available here:

    http://www.lsa.umich.edu/eli/micase/micase.htm

2)  Lesley Carmichael suggested I post my request to the
    Corpora-List.

3)  Jane Edwards pointed me at the Switchboard Transcription
    Project:

    http://www.icsi.berkeley.edu/real/stp/index.html

4)  Susana Sotillo wrote, "At a recent conference (CALICO) I
    saw a demonstration of the Speechcalator (Allen Blackwell
    and associates).  Why don't you write him at Carnegie-
    Mellon."

5)  Linda Bawcom offered an hour and a half of taped
    conversation that she used in her MA research.

Many thanks to everyone who replied.

Scott Drellishak
University of Washington
Seattle, WA



More information about the Corpora mailing list