[Corpora-List] Speech Corpus for Neural Network Training

Sat Jun 26 11:47:28 UTC 2004

[I posted this recently to the Linguist List, and a colleague suggested I
ought to try posting it here as well.]

I am involved in a research project whose goal is to produce a software
system for the control of electronic devices using continuous variables
extracted from human speech.  Part of this system will be a neural network
that recognizes various vowels and produces tracks of pitch and formant
frequencies.  Training the neural network will require a large amount of
data that we're hoping to get from an existing corpus, rather than creating
it ourselves.

We are looking for a corpus that contains samples of many speakers producing
many vowels (preferably in a less reduced register) that also contains
human-validated pitch and formant (F1, F2, and F3) tracks and, if possible,
bandwidth information.  A corpus that contains more than just vowels is
fine, since we can discard sections of the samples that do not suit our
needs.

If anyone knows of a corpus like this, either freely distributed or
requiring a fee, I would like to know how to get ahold of it.

I will post a summary of the replies that I receive.  Thanks in advance for
your time.

Scott Drellishak
University of Washington
Seattle, WA