24.3117, Qs: Issues in creating a speech corpus

Wed Jul 31 16:16:45 UTC 2013

LINGUIST List: Vol-24-3117. Wed Jul 31 2013. ISSN: 1069 - 4875.

Subject: 24.3117, Qs: Issues in creating a speech corpus

Moderator: Damir Cavar, Eastern Michigan U <damir at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Mateja Schuck, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Alex Isotalo <alx at linguistlist.org>
================================================================  

Date: Wed, 31 Jul 2013 12:16:30
From: Pankaj Dwivedi [pankaj.linguistics at gmail.com]
Subject: Issues in creating a speech corpus

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-3117.html&submissionid=18065807&topicid=8&msgnumber=1

Hello all, 

I am working on a lessor known dialect of Hindi language. I have around 15 hours of its speech data recorded with a professional recorder-Olympus LS100. Data mainly include free discourses from a variety of fields such as stories, daily routine, recipes, experiences, common words in isolation etc.I have also created  text files/text grids for audio files using PRAAT. I am wondering if I can create a small speech corpus out of it. If yes, How? What next step should I take? I want to create a TTS system for it. Is it possible? Please explain it to me step by step. 

You help will be duly acknowledged in research publications in form of a co-author?    

Thank you!

Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics

----------------------------------------------------------
LINGUIST List: Vol-24-3117	
----------------------------------------------------------